playwright-mcp/MCP-ROOTS-NOTES.md
Ryan Malloy ecedcc48d6 feat: implement MCP client session persistence for browser contexts
Add session persistence system to maintain browser contexts across MCP tool calls:

- SessionManager: Global persistent context management keyed by session ID
- BrowserServerBackend: Modified to use session persistence and reuse contexts
- Context: Enhanced to support environment introspection and session ID override
- MCP Roots: Added educational tool descriptions for workspace-aware automation
- Environment Detection: System file introspection for display/GPU/project detection

Key features:
- Browser contexts survive between tool calls preserving cache, cookies, state
- Complete session isolation between different MCP clients
- Zero startup overhead for repeat connections
- Backward compatible with existing implementations
- Support for MCP roots workspace detection and environment adaptation

Tested and verified with real Claude Code client showing successful session
persistence across navigation calls with preserved browser state.

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-08-12 12:22:46 -06:00

300 lines
9.9 KiB
Markdown

# MCP Roots for Workspace-Aware Browser Automation - Detailed Notes
## Overview
This document captures the complete conversation and technical details around implementing workspace-aware browser automation using MCP roots for environment declaration and dynamic configuration.
## The Problem Statement
**Multi-Client Isolation Challenge:**
- Multiple MCP clients running simultaneously, each working on different codebases
- Each client needs isolated Playwright sessions
- Browser windows should display on the client's desktop context
- Screenshots/videos should save to the client's project directory
- Sessions must remain completely isolated from each other
**Traditional Configuration Limitations:**
- Environment variables: Global, not per-client
- Config files: Each client needs to know its own context
- Tool parameters: Requires manual specification on every call
- Configuration tools: Still requires client to understand context
## The Key Insight
The real problem isn't configuration complexity - it's **workspace-aware isolation**. Each MCP client represents a distinct workspace with its own:
- Project directory (where files should be saved)
- Desktop context (where windows should appear)
- Available system resources (GPU, displays, etc.)
## The MCP Roots Solution
### Core Concept
Leverage MCP's existing "roots" capability to declare execution environments rather than just file system access. Following the UNIX philosophy that "everything is a file," we expose actual system files that define the environment.
### How It Works
1. **Client declares roots during connection:**
```json
{
"capabilities": {
"roots": {
"listChanged": true
}
}
}
```
2. **Client exposes environment-defining files:**
- `file:///path/to/their/project` - artifact save location
- `file:///tmp/.X11-unix` - available X11 displays
- `file:///dev/dri` - GPU capabilities
- `file:///sys/class/graphics` - framebuffer information
- `file:///proc/meminfo` - memory constraints
3. **Server introspects exposed files:**
- Parse X11 sockets to discover displays (X0 → DISPLAY=:0)
- Check DRI devices for GPU acceleration
- Use project directory for screenshot/video output
- Read system files for capability detection
4. **Dynamic updates via MCP protocol:**
- Client can change roots anytime during session
- Client sends `notifications/roots/list_changed`
- Server calls `roots/list` to get updated environment
- Browser contexts automatically reconfigure
### Self-Teaching System
Tool descriptions become educational, explaining what roots to expose:
```typescript
{
name: 'browser_navigate',
description: `Navigate to URL.
ENVIRONMENT: Detects context from exposed roots:
- file:///path/to/project → saves screenshots/videos there
- file:///tmp/.X11-unix → detects available displays (X0=:0, X1=:1)
- file:///dev/dri → enables GPU acceleration if available
TIP: Change roots to switch workspace/display context dynamically.`
}
```
## Technical Architecture
### Session Isolation
- Each MCP client gets unique session ID based on client info + timestamp + random hash
- Browser contexts are completely isolated per session
- Video recording directories are session-specific
- No cross-contamination between clients
### Environment Detection
```typescript
// Example introspection logic
const detectDisplays = (x11Root: string) => {
const sockets = fs.readdirSync(x11Root);
return sockets
.filter(name => name.startsWith('X'))
.map(name => ({ socket: name, display: `:${name.slice(1)}` }));
};
const detectGPU = (driRoot: string) => {
const devices = fs.readdirSync(driRoot);
return {
hasGPU: devices.some(d => d.startsWith('card')),
hasRender: devices.some(d => d.startsWith('renderD'))
};
};
```
### Dynamic Workspace Switching
```
// Client working on project1
Client exposes: file:///home/user/project1, file:///tmp/.X11-unix/X0
// Later switches to project2 with different display
Client updates roots: file:///home/user/project2, file:///tmp/.X11-unix/X1
Client sends: notifications/roots/list_changed
Server detects change, reconfigures browser contexts automatically
```
## Implementation Benefits
### For MCP Protocol
- **Pure MCP:** Uses existing roots capability, no protocol extensions needed
- **Self-documenting:** Tool descriptions teach clients what to expose
- **Dynamic:** Supports runtime environment changes
- **Standard:** Follows established MCP patterns
### For Playwright
- **Flexible:** Showcases programmatic browser context configuration
- **Dynamic:** Runtime display/output directory configuration
- **Isolated:** Strong session boundaries per client
- **Capabilities-aware:** Automatic GPU/display detection
### For Clients (LLMs)
- **Zero cognitive overhead:** Environment is implicit in connection
- **Familiar pattern:** Uses existing root management
- **Self-teaching:** Tool descriptions explain requirements
- **Flexible:** Can change workspace context dynamically
## Conversation Evolution
### Initial Exploration
Started with video recording feature request, evolved into session isolation requirements.
### Configuration Approaches Considered
1. **Environment variables** - Too global
2. **Configuration tools** - Still requires manual setup
3. **Tool parameters** - Repetitive and error-prone
4. **MCP roots introspection** - Elegant and automatic
### Key Realizations
1. **UNIX Philosophy:** Everything is a file - expose real system files
2. **Workspace Context:** Environment should travel with MCP connection
3. **Dynamic Updates:** MCP roots can change during session
4. **Self-Teaching:** Use tool descriptions to educate clients
5. **Simplicity:** Leverage existing MCP infrastructure rather than building new complexity
### Architecture Decision
Chose session-level environment (via roots) over tool-managed environment because:
- Environment is inherent to workspace, not individual tasks
- Impossible to forget environment setup
- Natural workspace isolation
- Supports dynamic context switching
## Current Implementation Status
### Completed Features
- ✅ Session isolation with unique session IDs
- ✅ Video recording with session-specific directories
- ✅ Browser context isolation per client
- ✅ Docker deployment with optional headless mode
- ✅ MCP tool system with comprehensive capabilities
### Planned Features
- 🔄 MCP roots capability support
- 🔄 Environment introspection system
- 🔄 Self-documenting tool descriptions
- 🔄 Dynamic workspace switching
- 🔄 System file capability detection
## System File Mappings
### Display Detection
- `/tmp/.X11-unix/X0` → `DISPLAY=:0`
- `/tmp/.X11-unix/X1` → `DISPLAY=:1`
- Multiple sockets = multiple display options
### GPU Capabilities
- `/dev/dri/card0` → Primary GPU available
- `/dev/dri/renderD128` → Render node available
- Presence indicates GPU acceleration possible
### Memory Constraints
- `/proc/meminfo` → Available system memory
- `/sys/fs/cgroup/memory/memory.limit_in_bytes` → Container limits
### Project Context
- Any exposed project directory → Screenshot/video save location
- Directory permissions indicate write capabilities
## Example Scenarios
### Scenario 1: Desktop Development
```
Client exposes:
- file:///home/user/project-a
- file:///tmp/.X11-unix
Server detects:
- Project directory: /home/user/project-a
- Display: :0 (from X0 socket)
- Result: GUI browser on main display, files saved to project-a
```
### Scenario 2: Multi-Display Setup
```
Client exposes:
- file:///home/user/project-b
- file:///tmp/.X11-unix/X1
Server detects:
- Project directory: /home/user/project-b
- Display: :1 (from X1 socket)
- Result: GUI browser on secondary display, files saved to project-b
```
### Scenario 3: Headless Container
```
Client exposes:
- file:///workspace/project-c
- (no X11 sockets exposed)
Server detects:
- Project directory: /workspace/project-c
- No displays available
- Result: Headless browser, files saved to project-c
```
### Scenario 4: GPU-Accelerated
```
Client exposes:
- file:///home/user/project-d
- file:///tmp/.X11-unix
- file:///dev/dri
Server detects:
- Project directory: /home/user/project-d
- Display: :0
- GPU: Available (card0, renderD128)
- Result: GPU-accelerated browser with hardware rendering
```
## Questions and Considerations
### Protocol Compliance
- **Question:** Do all MCP clients support dynamic root updates?
- **Answer:** It's in the spec, most should support it
### Performance Impact
- **Question:** Cost of filesystem introspection on each root change?
- **Answer:** Minimal - just reading directory listings and small files
### Security Implications
- **Question:** What if client exposes sensitive system files?
- **Answer:** Server only reads specific known paths, validates access
### Fallback Behavior
- **Question:** What if expected roots aren't exposed?
- **Answer:** Graceful degradation to headless/default configuration
## Future Enhancements
### Extended System Detection
- Network interface detection via `/sys/class/net`
- Audio capabilities via `/proc/asound`
- Container detection via `/proc/1/cgroup`
### Resource Constraints
- CPU limits from cgroup files
- Memory limits for browser configuration
- Disk space checks for recording limits
### Multi-User Support
- User ID detection for proper file permissions
- Group membership for device access
- Home directory discovery
## Conclusion
This architecture successfully addresses multi-client workspace isolation by:
1. **Leveraging existing MCP infrastructure** (roots) rather than building new complexity
2. **Following UNIX philosophy** by exposing real system files that define environment
3. **Enabling dynamic workspace switching** through standard MCP protocol mechanisms
4. **Self-teaching through tool descriptions** so clients learn what to expose
5. **Maintaining strong isolation** while eliminating configuration overhead
The result is workspace-aware browser automation that feels magical but is built on solid, standard protocols and UNIX principles.