playwright-mcp/MCP-ROOTS-NOTES.md
Ryan Malloy ecedcc48d6 feat: implement MCP client session persistence for browser contexts
Add session persistence system to maintain browser contexts across MCP tool calls:

- SessionManager: Global persistent context management keyed by session ID
- BrowserServerBackend: Modified to use session persistence and reuse contexts
- Context: Enhanced to support environment introspection and session ID override
- MCP Roots: Added educational tool descriptions for workspace-aware automation
- Environment Detection: System file introspection for display/GPU/project detection

Key features:
- Browser contexts survive between tool calls preserving cache, cookies, state
- Complete session isolation between different MCP clients
- Zero startup overhead for repeat connections
- Backward compatible with existing implementations
- Support for MCP roots workspace detection and environment adaptation

Tested and verified with real Claude Code client showing successful session
persistence across navigation calls with preserved browser state.

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-08-12 12:22:46 -06:00

9.9 KiB

MCP Roots for Workspace-Aware Browser Automation - Detailed Notes

Overview

This document captures the complete conversation and technical details around implementing workspace-aware browser automation using MCP roots for environment declaration and dynamic configuration.

The Problem Statement

Multi-Client Isolation Challenge:

  • Multiple MCP clients running simultaneously, each working on different codebases
  • Each client needs isolated Playwright sessions
  • Browser windows should display on the client's desktop context
  • Screenshots/videos should save to the client's project directory
  • Sessions must remain completely isolated from each other

Traditional Configuration Limitations:

  • Environment variables: Global, not per-client
  • Config files: Each client needs to know its own context
  • Tool parameters: Requires manual specification on every call
  • Configuration tools: Still requires client to understand context

The Key Insight

The real problem isn't configuration complexity - it's workspace-aware isolation. Each MCP client represents a distinct workspace with its own:

  • Project directory (where files should be saved)
  • Desktop context (where windows should appear)
  • Available system resources (GPU, displays, etc.)

The MCP Roots Solution

Core Concept

Leverage MCP's existing "roots" capability to declare execution environments rather than just file system access. Following the UNIX philosophy that "everything is a file," we expose actual system files that define the environment.

How It Works

  1. Client declares roots during connection:

    {
      "capabilities": {
        "roots": {
          "listChanged": true
        }
      }
    }
    
  2. Client exposes environment-defining files:

    • file:///path/to/their/project - artifact save location
    • file:///tmp/.X11-unix - available X11 displays
    • file:///dev/dri - GPU capabilities
    • file:///sys/class/graphics - framebuffer information
    • file:///proc/meminfo - memory constraints
  3. Server introspects exposed files:

    • Parse X11 sockets to discover displays (X0 → DISPLAY=:0)
    • Check DRI devices for GPU acceleration
    • Use project directory for screenshot/video output
    • Read system files for capability detection
  4. Dynamic updates via MCP protocol:

    • Client can change roots anytime during session
    • Client sends notifications/roots/list_changed
    • Server calls roots/list to get updated environment
    • Browser contexts automatically reconfigure

Self-Teaching System

Tool descriptions become educational, explaining what roots to expose:

{
  name: 'browser_navigate',
  description: `Navigate to URL. 
  
  ENVIRONMENT: Detects context from exposed roots:
  - file:///path/to/project → saves screenshots/videos there
  - file:///tmp/.X11-unix → detects available displays (X0=:0, X1=:1)  
  - file:///dev/dri → enables GPU acceleration if available
  
  TIP: Change roots to switch workspace/display context dynamically.`
}

Technical Architecture

Session Isolation

  • Each MCP client gets unique session ID based on client info + timestamp + random hash
  • Browser contexts are completely isolated per session
  • Video recording directories are session-specific
  • No cross-contamination between clients

Environment Detection

// Example introspection logic
const detectDisplays = (x11Root: string) => {
  const sockets = fs.readdirSync(x11Root);
  return sockets
    .filter(name => name.startsWith('X'))
    .map(name => ({ socket: name, display: `:${name.slice(1)}` }));
};

const detectGPU = (driRoot: string) => {
  const devices = fs.readdirSync(driRoot);
  return {
    hasGPU: devices.some(d => d.startsWith('card')),
    hasRender: devices.some(d => d.startsWith('renderD'))
  };
};

Dynamic Workspace Switching

// Client working on project1
Client exposes: file:///home/user/project1, file:///tmp/.X11-unix/X0

// Later switches to project2 with different display
Client updates roots: file:///home/user/project2, file:///tmp/.X11-unix/X1
Client sends: notifications/roots/list_changed
Server detects change, reconfigures browser contexts automatically

Implementation Benefits

For MCP Protocol

  • Pure MCP: Uses existing roots capability, no protocol extensions needed
  • Self-documenting: Tool descriptions teach clients what to expose
  • Dynamic: Supports runtime environment changes
  • Standard: Follows established MCP patterns

For Playwright

  • Flexible: Showcases programmatic browser context configuration
  • Dynamic: Runtime display/output directory configuration
  • Isolated: Strong session boundaries per client
  • Capabilities-aware: Automatic GPU/display detection

For Clients (LLMs)

  • Zero cognitive overhead: Environment is implicit in connection
  • Familiar pattern: Uses existing root management
  • Self-teaching: Tool descriptions explain requirements
  • Flexible: Can change workspace context dynamically

Conversation Evolution

Initial Exploration

Started with video recording feature request, evolved into session isolation requirements.

Configuration Approaches Considered

  1. Environment variables - Too global
  2. Configuration tools - Still requires manual setup
  3. Tool parameters - Repetitive and error-prone
  4. MCP roots introspection - Elegant and automatic

Key Realizations

  1. UNIX Philosophy: Everything is a file - expose real system files
  2. Workspace Context: Environment should travel with MCP connection
  3. Dynamic Updates: MCP roots can change during session
  4. Self-Teaching: Use tool descriptions to educate clients
  5. Simplicity: Leverage existing MCP infrastructure rather than building new complexity

Architecture Decision

Chose session-level environment (via roots) over tool-managed environment because:

  • Environment is inherent to workspace, not individual tasks
  • Impossible to forget environment setup
  • Natural workspace isolation
  • Supports dynamic context switching

Current Implementation Status

Completed Features

  • Session isolation with unique session IDs
  • Video recording with session-specific directories
  • Browser context isolation per client
  • Docker deployment with optional headless mode
  • MCP tool system with comprehensive capabilities

Planned Features

  • 🔄 MCP roots capability support
  • 🔄 Environment introspection system
  • 🔄 Self-documenting tool descriptions
  • 🔄 Dynamic workspace switching
  • 🔄 System file capability detection

System File Mappings

Display Detection

  • /tmp/.X11-unix/X0DISPLAY=:0
  • /tmp/.X11-unix/X1DISPLAY=:1
  • Multiple sockets = multiple display options

GPU Capabilities

  • /dev/dri/card0 → Primary GPU available
  • /dev/dri/renderD128 → Render node available
  • Presence indicates GPU acceleration possible

Memory Constraints

  • /proc/meminfo → Available system memory
  • /sys/fs/cgroup/memory/memory.limit_in_bytes → Container limits

Project Context

  • Any exposed project directory → Screenshot/video save location
  • Directory permissions indicate write capabilities

Example Scenarios

Scenario 1: Desktop Development

Client exposes:
- file:///home/user/project-a
- file:///tmp/.X11-unix

Server detects:
- Project directory: /home/user/project-a
- Display: :0 (from X0 socket)
- Result: GUI browser on main display, files saved to project-a

Scenario 2: Multi-Display Setup

Client exposes:
- file:///home/user/project-b  
- file:///tmp/.X11-unix/X1

Server detects:
- Project directory: /home/user/project-b
- Display: :1 (from X1 socket)
- Result: GUI browser on secondary display, files saved to project-b

Scenario 3: Headless Container

Client exposes:
- file:///workspace/project-c
- (no X11 sockets exposed)

Server detects:
- Project directory: /workspace/project-c
- No displays available
- Result: Headless browser, files saved to project-c

Scenario 4: GPU-Accelerated

Client exposes:
- file:///home/user/project-d
- file:///tmp/.X11-unix
- file:///dev/dri

Server detects:
- Project directory: /home/user/project-d
- Display: :0
- GPU: Available (card0, renderD128)
- Result: GPU-accelerated browser with hardware rendering

Questions and Considerations

Protocol Compliance

  • Question: Do all MCP clients support dynamic root updates?
  • Answer: It's in the spec, most should support it

Performance Impact

  • Question: Cost of filesystem introspection on each root change?
  • Answer: Minimal - just reading directory listings and small files

Security Implications

  • Question: What if client exposes sensitive system files?
  • Answer: Server only reads specific known paths, validates access

Fallback Behavior

  • Question: What if expected roots aren't exposed?
  • Answer: Graceful degradation to headless/default configuration

Future Enhancements

Extended System Detection

  • Network interface detection via /sys/class/net
  • Audio capabilities via /proc/asound
  • Container detection via /proc/1/cgroup

Resource Constraints

  • CPU limits from cgroup files
  • Memory limits for browser configuration
  • Disk space checks for recording limits

Multi-User Support

  • User ID detection for proper file permissions
  • Group membership for device access
  • Home directory discovery

Conclusion

This architecture successfully addresses multi-client workspace isolation by:

  1. Leveraging existing MCP infrastructure (roots) rather than building new complexity
  2. Following UNIX philosophy by exposing real system files that define environment
  3. Enabling dynamic workspace switching through standard MCP protocol mechanisms
  4. Self-teaching through tool descriptions so clients learn what to expose
  5. Maintaining strong isolation while eliminating configuration overhead

The result is workspace-aware browser automation that feels magical but is built on solid, standard protocols and UNIX principles.