Ryan Malloy ecedcc48d6 feat: implement MCP client session persistence for browser contexts

Add session persistence system to maintain browser contexts across MCP tool calls:

- SessionManager: Global persistent context management keyed by session ID
- BrowserServerBackend: Modified to use session persistence and reuse contexts
- Context: Enhanced to support environment introspection and session ID override
- MCP Roots: Added educational tool descriptions for workspace-aware automation
- Environment Detection: System file introspection for display/GPU/project detection

Key features:
- Browser contexts survive between tool calls preserving cache, cookies, state
- Complete session isolation between different MCP clients
- Zero startup overhead for repeat connections
- Backward compatible with existing implementations
- Support for MCP roots workspace detection and environment adaptation

Tested and verified with real Claude Code client showing successful session
persistence across navigation calls with preserved browser state.

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>

2025-08-12 12:22:46 -06:00

9.9 KiB

Raw Blame History

MCP Roots for Workspace-Aware Browser Automation - Detailed Notes

Overview

This document captures the complete conversation and technical details around implementing workspace-aware browser automation using MCP roots for environment declaration and dynamic configuration.

The Problem Statement

Multi-Client Isolation Challenge:

Multiple MCP clients running simultaneously, each working on different codebases
Each client needs isolated Playwright sessions
Browser windows should display on the client's desktop context
Screenshots/videos should save to the client's project directory
Sessions must remain completely isolated from each other

Traditional Configuration Limitations:

Environment variables: Global, not per-client
Config files: Each client needs to know its own context
Tool parameters: Requires manual specification on every call
Configuration tools: Still requires client to understand context

The Key Insight

The real problem isn't configuration complexity - it's workspace-aware isolation. Each MCP client represents a distinct workspace with its own:

Project directory (where files should be saved)
Desktop context (where windows should appear)
Available system resources (GPU, displays, etc.)

The MCP Roots Solution

Core Concept

Leverage MCP's existing "roots" capability to declare execution environments rather than just file system access. Following the UNIX philosophy that "everything is a file," we expose actual system files that define the environment.

How It Works

Client declares roots during connection:

{
  "capabilities": {
    "roots": {
      "listChanged": true
    }
  }
}

Client exposes environment-defining files:
- file:///path/to/their/project - artifact save location
- file:///tmp/.X11-unix - available X11 displays
- file:///dev/dri - GPU capabilities
- file:///sys/class/graphics - framebuffer information
- file:///proc/meminfo - memory constraints
Server introspects exposed files:
- Parse X11 sockets to discover displays (X0 → DISPLAY=:0)
- Check DRI devices for GPU acceleration
- Use project directory for screenshot/video output
- Read system files for capability detection
Dynamic updates via MCP protocol:
- Client can change roots anytime during session
- Client sends notifications/roots/list_changed
- Server calls roots/list to get updated environment
- Browser contexts automatically reconfigure

Self-Teaching System

Tool descriptions become educational, explaining what roots to expose:

{
  name: 'browser_navigate',
  description: `Navigate to URL. 
  
  ENVIRONMENT: Detects context from exposed roots:
  - file:///path/to/project → saves screenshots/videos there
  - file:///tmp/.X11-unix → detects available displays (X0=:0, X1=:1)  
  - file:///dev/dri → enables GPU acceleration if available
  
  TIP: Change roots to switch workspace/display context dynamically.`
}

Technical Architecture

Session Isolation

Each MCP client gets unique session ID based on client info + timestamp + random hash
Browser contexts are completely isolated per session
Video recording directories are session-specific
No cross-contamination between clients

Environment Detection

// Example introspection logic
const detectDisplays = (x11Root: string) => {
  const sockets = fs.readdirSync(x11Root);
  return sockets
    .filter(name => name.startsWith('X'))
    .map(name => ({ socket: name, display: `:${name.slice(1)}` }));
};

const detectGPU = (driRoot: string) => {
  const devices = fs.readdirSync(driRoot);
  return {
    hasGPU: devices.some(d => d.startsWith('card')),
    hasRender: devices.some(d => d.startsWith('renderD'))
  };
};

Dynamic Workspace Switching

// Client working on project1
Client exposes: file:///home/user/project1, file:///tmp/.X11-unix/X0

// Later switches to project2 with different display
Client updates roots: file:///home/user/project2, file:///tmp/.X11-unix/X1
Client sends: notifications/roots/list_changed
Server detects change, reconfigures browser contexts automatically

Implementation Benefits

For MCP Protocol

Pure MCP: Uses existing roots capability, no protocol extensions needed
Self-documenting: Tool descriptions teach clients what to expose
Dynamic: Supports runtime environment changes
Standard: Follows established MCP patterns

For Playwright

Flexible: Showcases programmatic browser context configuration
Dynamic: Runtime display/output directory configuration
Isolated: Strong session boundaries per client
Capabilities-aware: Automatic GPU/display detection

For Clients (LLMs)

Zero cognitive overhead: Environment is implicit in connection
Familiar pattern: Uses existing root management
Self-teaching: Tool descriptions explain requirements
Flexible: Can change workspace context dynamically

Conversation Evolution

Initial Exploration

Started with video recording feature request, evolved into session isolation requirements.

Configuration Approaches Considered

Environment variables - Too global
Configuration tools - Still requires manual setup
Tool parameters - Repetitive and error-prone
MCP roots introspection - Elegant and automatic

Key Realizations

UNIX Philosophy: Everything is a file - expose real system files
Workspace Context: Environment should travel with MCP connection
Dynamic Updates: MCP roots can change during session
Self-Teaching: Use tool descriptions to educate clients
Simplicity: Leverage existing MCP infrastructure rather than building new complexity

Architecture Decision

Chose session-level environment (via roots) over tool-managed environment because:

Environment is inherent to workspace, not individual tasks
Impossible to forget environment setup
Natural workspace isolation
Supports dynamic context switching

Current Implementation Status

Completed Features

✅ Session isolation with unique session IDs
✅ Video recording with session-specific directories
✅ Browser context isolation per client
✅ Docker deployment with optional headless mode
✅ MCP tool system with comprehensive capabilities

Planned Features

🔄 MCP roots capability support
🔄 Environment introspection system
🔄 Self-documenting tool descriptions
🔄 Dynamic workspace switching
🔄 System file capability detection

System File Mappings

Display Detection

/tmp/.X11-unix/X0 → DISPLAY=:0
/tmp/.X11-unix/X1 → DISPLAY=:1
Multiple sockets = multiple display options

GPU Capabilities

/dev/dri/card0 → Primary GPU available
/dev/dri/renderD128 → Render node available
Presence indicates GPU acceleration possible

Memory Constraints

/proc/meminfo → Available system memory
/sys/fs/cgroup/memory/memory.limit_in_bytes → Container limits

Project Context

Any exposed project directory → Screenshot/video save location
Directory permissions indicate write capabilities

Example Scenarios

Scenario 1: Desktop Development

Client exposes:
- file:///home/user/project-a
- file:///tmp/.X11-unix

Server detects:
- Project directory: /home/user/project-a
- Display: :0 (from X0 socket)
- Result: GUI browser on main display, files saved to project-a

Scenario 2: Multi-Display Setup

Client exposes:
- file:///home/user/project-b  
- file:///tmp/.X11-unix/X1

Server detects:
- Project directory: /home/user/project-b
- Display: :1 (from X1 socket)
- Result: GUI browser on secondary display, files saved to project-b

Scenario 3: Headless Container

Client exposes:
- file:///workspace/project-c
- (no X11 sockets exposed)

Server detects:
- Project directory: /workspace/project-c
- No displays available
- Result: Headless browser, files saved to project-c

Scenario 4: GPU-Accelerated

Client exposes:
- file:///home/user/project-d
- file:///tmp/.X11-unix
- file:///dev/dri

Server detects:
- Project directory: /home/user/project-d
- Display: :0
- GPU: Available (card0, renderD128)
- Result: GPU-accelerated browser with hardware rendering

Questions and Considerations

Protocol Compliance

Question: Do all MCP clients support dynamic root updates?
Answer: It's in the spec, most should support it

Performance Impact

Question: Cost of filesystem introspection on each root change?
Answer: Minimal - just reading directory listings and small files

Security Implications

Question: What if client exposes sensitive system files?
Answer: Server only reads specific known paths, validates access

Fallback Behavior

Question: What if expected roots aren't exposed?
Answer: Graceful degradation to headless/default configuration

Future Enhancements

Extended System Detection

Network interface detection via /sys/class/net
Audio capabilities via /proc/asound
Container detection via /proc/1/cgroup

Resource Constraints

CPU limits from cgroup files
Memory limits for browser configuration
Disk space checks for recording limits

Multi-User Support

User ID detection for proper file permissions
Group membership for device access
Home directory discovery

Conclusion

This architecture successfully addresses multi-client workspace isolation by:

Leveraging existing MCP infrastructure (roots) rather than building new complexity
Following UNIX philosophy by exposing real system files that define environment
Enabling dynamic workspace switching through standard MCP protocol mechanisms
Self-teaching through tool descriptions so clients learn what to expose
Maintaining strong isolation while eliminating configuration overhead

The result is workspace-aware browser automation that feels magical but is built on solid, standard protocols and UNIX principles.

9.9 KiB Raw Blame History