Add session persistence system to maintain browser contexts across MCP tool calls: - SessionManager: Global persistent context management keyed by session ID - BrowserServerBackend: Modified to use session persistence and reuse contexts - Context: Enhanced to support environment introspection and session ID override - MCP Roots: Added educational tool descriptions for workspace-aware automation - Environment Detection: System file introspection for display/GPU/project detection Key features: - Browser contexts survive between tool calls preserving cache, cookies, state - Complete session isolation between different MCP clients - Zero startup overhead for repeat connections - Backward compatible with existing implementations - Support for MCP roots workspace detection and environment adaptation Tested and verified with real Claude Code client showing successful session persistence across navigation calls with preserved browser state. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com>
9.9 KiB
MCP Roots for Workspace-Aware Browser Automation - Detailed Notes
Overview
This document captures the complete conversation and technical details around implementing workspace-aware browser automation using MCP roots for environment declaration and dynamic configuration.
The Problem Statement
Multi-Client Isolation Challenge:
- Multiple MCP clients running simultaneously, each working on different codebases
- Each client needs isolated Playwright sessions
- Browser windows should display on the client's desktop context
- Screenshots/videos should save to the client's project directory
- Sessions must remain completely isolated from each other
Traditional Configuration Limitations:
- Environment variables: Global, not per-client
- Config files: Each client needs to know its own context
- Tool parameters: Requires manual specification on every call
- Configuration tools: Still requires client to understand context
The Key Insight
The real problem isn't configuration complexity - it's workspace-aware isolation. Each MCP client represents a distinct workspace with its own:
- Project directory (where files should be saved)
- Desktop context (where windows should appear)
- Available system resources (GPU, displays, etc.)
The MCP Roots Solution
Core Concept
Leverage MCP's existing "roots" capability to declare execution environments rather than just file system access. Following the UNIX philosophy that "everything is a file," we expose actual system files that define the environment.
How It Works
-
Client declares roots during connection:
{ "capabilities": { "roots": { "listChanged": true } } } -
Client exposes environment-defining files:
file:///path/to/their/project- artifact save locationfile:///tmp/.X11-unix- available X11 displaysfile:///dev/dri- GPU capabilitiesfile:///sys/class/graphics- framebuffer informationfile:///proc/meminfo- memory constraints
-
Server introspects exposed files:
- Parse X11 sockets to discover displays (X0 → DISPLAY=:0)
- Check DRI devices for GPU acceleration
- Use project directory for screenshot/video output
- Read system files for capability detection
-
Dynamic updates via MCP protocol:
- Client can change roots anytime during session
- Client sends
notifications/roots/list_changed - Server calls
roots/listto get updated environment - Browser contexts automatically reconfigure
Self-Teaching System
Tool descriptions become educational, explaining what roots to expose:
{
name: 'browser_navigate',
description: `Navigate to URL.
ENVIRONMENT: Detects context from exposed roots:
- file:///path/to/project → saves screenshots/videos there
- file:///tmp/.X11-unix → detects available displays (X0=:0, X1=:1)
- file:///dev/dri → enables GPU acceleration if available
TIP: Change roots to switch workspace/display context dynamically.`
}
Technical Architecture
Session Isolation
- Each MCP client gets unique session ID based on client info + timestamp + random hash
- Browser contexts are completely isolated per session
- Video recording directories are session-specific
- No cross-contamination between clients
Environment Detection
// Example introspection logic
const detectDisplays = (x11Root: string) => {
const sockets = fs.readdirSync(x11Root);
return sockets
.filter(name => name.startsWith('X'))
.map(name => ({ socket: name, display: `:${name.slice(1)}` }));
};
const detectGPU = (driRoot: string) => {
const devices = fs.readdirSync(driRoot);
return {
hasGPU: devices.some(d => d.startsWith('card')),
hasRender: devices.some(d => d.startsWith('renderD'))
};
};
Dynamic Workspace Switching
// Client working on project1
Client exposes: file:///home/user/project1, file:///tmp/.X11-unix/X0
// Later switches to project2 with different display
Client updates roots: file:///home/user/project2, file:///tmp/.X11-unix/X1
Client sends: notifications/roots/list_changed
Server detects change, reconfigures browser contexts automatically
Implementation Benefits
For MCP Protocol
- Pure MCP: Uses existing roots capability, no protocol extensions needed
- Self-documenting: Tool descriptions teach clients what to expose
- Dynamic: Supports runtime environment changes
- Standard: Follows established MCP patterns
For Playwright
- Flexible: Showcases programmatic browser context configuration
- Dynamic: Runtime display/output directory configuration
- Isolated: Strong session boundaries per client
- Capabilities-aware: Automatic GPU/display detection
For Clients (LLMs)
- Zero cognitive overhead: Environment is implicit in connection
- Familiar pattern: Uses existing root management
- Self-teaching: Tool descriptions explain requirements
- Flexible: Can change workspace context dynamically
Conversation Evolution
Initial Exploration
Started with video recording feature request, evolved into session isolation requirements.
Configuration Approaches Considered
- Environment variables - Too global
- Configuration tools - Still requires manual setup
- Tool parameters - Repetitive and error-prone
- MCP roots introspection - Elegant and automatic
Key Realizations
- UNIX Philosophy: Everything is a file - expose real system files
- Workspace Context: Environment should travel with MCP connection
- Dynamic Updates: MCP roots can change during session
- Self-Teaching: Use tool descriptions to educate clients
- Simplicity: Leverage existing MCP infrastructure rather than building new complexity
Architecture Decision
Chose session-level environment (via roots) over tool-managed environment because:
- Environment is inherent to workspace, not individual tasks
- Impossible to forget environment setup
- Natural workspace isolation
- Supports dynamic context switching
Current Implementation Status
Completed Features
- ✅ Session isolation with unique session IDs
- ✅ Video recording with session-specific directories
- ✅ Browser context isolation per client
- ✅ Docker deployment with optional headless mode
- ✅ MCP tool system with comprehensive capabilities
Planned Features
- 🔄 MCP roots capability support
- 🔄 Environment introspection system
- 🔄 Self-documenting tool descriptions
- 🔄 Dynamic workspace switching
- 🔄 System file capability detection
System File Mappings
Display Detection
/tmp/.X11-unix/X0→DISPLAY=:0/tmp/.X11-unix/X1→DISPLAY=:1- Multiple sockets = multiple display options
GPU Capabilities
/dev/dri/card0→ Primary GPU available/dev/dri/renderD128→ Render node available- Presence indicates GPU acceleration possible
Memory Constraints
/proc/meminfo→ Available system memory/sys/fs/cgroup/memory/memory.limit_in_bytes→ Container limits
Project Context
- Any exposed project directory → Screenshot/video save location
- Directory permissions indicate write capabilities
Example Scenarios
Scenario 1: Desktop Development
Client exposes:
- file:///home/user/project-a
- file:///tmp/.X11-unix
Server detects:
- Project directory: /home/user/project-a
- Display: :0 (from X0 socket)
- Result: GUI browser on main display, files saved to project-a
Scenario 2: Multi-Display Setup
Client exposes:
- file:///home/user/project-b
- file:///tmp/.X11-unix/X1
Server detects:
- Project directory: /home/user/project-b
- Display: :1 (from X1 socket)
- Result: GUI browser on secondary display, files saved to project-b
Scenario 3: Headless Container
Client exposes:
- file:///workspace/project-c
- (no X11 sockets exposed)
Server detects:
- Project directory: /workspace/project-c
- No displays available
- Result: Headless browser, files saved to project-c
Scenario 4: GPU-Accelerated
Client exposes:
- file:///home/user/project-d
- file:///tmp/.X11-unix
- file:///dev/dri
Server detects:
- Project directory: /home/user/project-d
- Display: :0
- GPU: Available (card0, renderD128)
- Result: GPU-accelerated browser with hardware rendering
Questions and Considerations
Protocol Compliance
- Question: Do all MCP clients support dynamic root updates?
- Answer: It's in the spec, most should support it
Performance Impact
- Question: Cost of filesystem introspection on each root change?
- Answer: Minimal - just reading directory listings and small files
Security Implications
- Question: What if client exposes sensitive system files?
- Answer: Server only reads specific known paths, validates access
Fallback Behavior
- Question: What if expected roots aren't exposed?
- Answer: Graceful degradation to headless/default configuration
Future Enhancements
Extended System Detection
- Network interface detection via
/sys/class/net - Audio capabilities via
/proc/asound - Container detection via
/proc/1/cgroup
Resource Constraints
- CPU limits from cgroup files
- Memory limits for browser configuration
- Disk space checks for recording limits
Multi-User Support
- User ID detection for proper file permissions
- Group membership for device access
- Home directory discovery
Conclusion
This architecture successfully addresses multi-client workspace isolation by:
- Leveraging existing MCP infrastructure (roots) rather than building new complexity
- Following UNIX philosophy by exposing real system files that define environment
- Enabling dynamic workspace switching through standard MCP protocol mechanisms
- Self-teaching through tool descriptions so clients learn what to expose
- Maintaining strong isolation while eliminating configuration overhead
The result is workspace-aware browser automation that feels magical but is built on solid, standard protocols and UNIX principles.