Add session persistence system to maintain browser contexts across MCP tool calls: - SessionManager: Global persistent context management keyed by session ID - BrowserServerBackend: Modified to use session persistence and reuse contexts - Context: Enhanced to support environment introspection and session ID override - MCP Roots: Added educational tool descriptions for workspace-aware automation - Environment Detection: System file introspection for display/GPU/project detection Key features: - Browser contexts survive between tool calls preserving cache, cookies, state - Complete session isolation between different MCP clients - Zero startup overhead for repeat connections - Backward compatible with existing implementations - Support for MCP roots workspace detection and environment adaptation Tested and verified with real Claude Code client showing successful session persistence across navigation calls with preserved browser state. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com>
300 lines
9.9 KiB
Markdown
300 lines
9.9 KiB
Markdown
# MCP Roots for Workspace-Aware Browser Automation - Detailed Notes
|
|
|
|
## Overview
|
|
|
|
This document captures the complete conversation and technical details around implementing workspace-aware browser automation using MCP roots for environment declaration and dynamic configuration.
|
|
|
|
## The Problem Statement
|
|
|
|
**Multi-Client Isolation Challenge:**
|
|
- Multiple MCP clients running simultaneously, each working on different codebases
|
|
- Each client needs isolated Playwright sessions
|
|
- Browser windows should display on the client's desktop context
|
|
- Screenshots/videos should save to the client's project directory
|
|
- Sessions must remain completely isolated from each other
|
|
|
|
**Traditional Configuration Limitations:**
|
|
- Environment variables: Global, not per-client
|
|
- Config files: Each client needs to know its own context
|
|
- Tool parameters: Requires manual specification on every call
|
|
- Configuration tools: Still requires client to understand context
|
|
|
|
## The Key Insight
|
|
|
|
The real problem isn't configuration complexity - it's **workspace-aware isolation**. Each MCP client represents a distinct workspace with its own:
|
|
- Project directory (where files should be saved)
|
|
- Desktop context (where windows should appear)
|
|
- Available system resources (GPU, displays, etc.)
|
|
|
|
## The MCP Roots Solution
|
|
|
|
### Core Concept
|
|
Leverage MCP's existing "roots" capability to declare execution environments rather than just file system access. Following the UNIX philosophy that "everything is a file," we expose actual system files that define the environment.
|
|
|
|
### How It Works
|
|
|
|
1. **Client declares roots during connection:**
|
|
```json
|
|
{
|
|
"capabilities": {
|
|
"roots": {
|
|
"listChanged": true
|
|
}
|
|
}
|
|
}
|
|
```
|
|
|
|
2. **Client exposes environment-defining files:**
|
|
- `file:///path/to/their/project` - artifact save location
|
|
- `file:///tmp/.X11-unix` - available X11 displays
|
|
- `file:///dev/dri` - GPU capabilities
|
|
- `file:///sys/class/graphics` - framebuffer information
|
|
- `file:///proc/meminfo` - memory constraints
|
|
|
|
3. **Server introspects exposed files:**
|
|
- Parse X11 sockets to discover displays (X0 → DISPLAY=:0)
|
|
- Check DRI devices for GPU acceleration
|
|
- Use project directory for screenshot/video output
|
|
- Read system files for capability detection
|
|
|
|
4. **Dynamic updates via MCP protocol:**
|
|
- Client can change roots anytime during session
|
|
- Client sends `notifications/roots/list_changed`
|
|
- Server calls `roots/list` to get updated environment
|
|
- Browser contexts automatically reconfigure
|
|
|
|
### Self-Teaching System
|
|
|
|
Tool descriptions become educational, explaining what roots to expose:
|
|
|
|
```typescript
|
|
{
|
|
name: 'browser_navigate',
|
|
description: `Navigate to URL.
|
|
|
|
ENVIRONMENT: Detects context from exposed roots:
|
|
- file:///path/to/project → saves screenshots/videos there
|
|
- file:///tmp/.X11-unix → detects available displays (X0=:0, X1=:1)
|
|
- file:///dev/dri → enables GPU acceleration if available
|
|
|
|
TIP: Change roots to switch workspace/display context dynamically.`
|
|
}
|
|
```
|
|
|
|
## Technical Architecture
|
|
|
|
### Session Isolation
|
|
- Each MCP client gets unique session ID based on client info + timestamp + random hash
|
|
- Browser contexts are completely isolated per session
|
|
- Video recording directories are session-specific
|
|
- No cross-contamination between clients
|
|
|
|
### Environment Detection
|
|
```typescript
|
|
// Example introspection logic
|
|
const detectDisplays = (x11Root: string) => {
|
|
const sockets = fs.readdirSync(x11Root);
|
|
return sockets
|
|
.filter(name => name.startsWith('X'))
|
|
.map(name => ({ socket: name, display: `:${name.slice(1)}` }));
|
|
};
|
|
|
|
const detectGPU = (driRoot: string) => {
|
|
const devices = fs.readdirSync(driRoot);
|
|
return {
|
|
hasGPU: devices.some(d => d.startsWith('card')),
|
|
hasRender: devices.some(d => d.startsWith('renderD'))
|
|
};
|
|
};
|
|
```
|
|
|
|
### Dynamic Workspace Switching
|
|
```
|
|
// Client working on project1
|
|
Client exposes: file:///home/user/project1, file:///tmp/.X11-unix/X0
|
|
|
|
// Later switches to project2 with different display
|
|
Client updates roots: file:///home/user/project2, file:///tmp/.X11-unix/X1
|
|
Client sends: notifications/roots/list_changed
|
|
Server detects change, reconfigures browser contexts automatically
|
|
```
|
|
|
|
## Implementation Benefits
|
|
|
|
### For MCP Protocol
|
|
- **Pure MCP:** Uses existing roots capability, no protocol extensions needed
|
|
- **Self-documenting:** Tool descriptions teach clients what to expose
|
|
- **Dynamic:** Supports runtime environment changes
|
|
- **Standard:** Follows established MCP patterns
|
|
|
|
### For Playwright
|
|
- **Flexible:** Showcases programmatic browser context configuration
|
|
- **Dynamic:** Runtime display/output directory configuration
|
|
- **Isolated:** Strong session boundaries per client
|
|
- **Capabilities-aware:** Automatic GPU/display detection
|
|
|
|
### For Clients (LLMs)
|
|
- **Zero cognitive overhead:** Environment is implicit in connection
|
|
- **Familiar pattern:** Uses existing root management
|
|
- **Self-teaching:** Tool descriptions explain requirements
|
|
- **Flexible:** Can change workspace context dynamically
|
|
|
|
## Conversation Evolution
|
|
|
|
### Initial Exploration
|
|
Started with video recording feature request, evolved into session isolation requirements.
|
|
|
|
### Configuration Approaches Considered
|
|
1. **Environment variables** - Too global
|
|
2. **Configuration tools** - Still requires manual setup
|
|
3. **Tool parameters** - Repetitive and error-prone
|
|
4. **MCP roots introspection** - Elegant and automatic
|
|
|
|
### Key Realizations
|
|
1. **UNIX Philosophy:** Everything is a file - expose real system files
|
|
2. **Workspace Context:** Environment should travel with MCP connection
|
|
3. **Dynamic Updates:** MCP roots can change during session
|
|
4. **Self-Teaching:** Use tool descriptions to educate clients
|
|
5. **Simplicity:** Leverage existing MCP infrastructure rather than building new complexity
|
|
|
|
### Architecture Decision
|
|
Chose session-level environment (via roots) over tool-managed environment because:
|
|
- Environment is inherent to workspace, not individual tasks
|
|
- Impossible to forget environment setup
|
|
- Natural workspace isolation
|
|
- Supports dynamic context switching
|
|
|
|
## Current Implementation Status
|
|
|
|
### Completed Features
|
|
- ✅ Session isolation with unique session IDs
|
|
- ✅ Video recording with session-specific directories
|
|
- ✅ Browser context isolation per client
|
|
- ✅ Docker deployment with optional headless mode
|
|
- ✅ MCP tool system with comprehensive capabilities
|
|
|
|
### Planned Features
|
|
- 🔄 MCP roots capability support
|
|
- 🔄 Environment introspection system
|
|
- 🔄 Self-documenting tool descriptions
|
|
- 🔄 Dynamic workspace switching
|
|
- 🔄 System file capability detection
|
|
|
|
## System File Mappings
|
|
|
|
### Display Detection
|
|
- `/tmp/.X11-unix/X0` → `DISPLAY=:0`
|
|
- `/tmp/.X11-unix/X1` → `DISPLAY=:1`
|
|
- Multiple sockets = multiple display options
|
|
|
|
### GPU Capabilities
|
|
- `/dev/dri/card0` → Primary GPU available
|
|
- `/dev/dri/renderD128` → Render node available
|
|
- Presence indicates GPU acceleration possible
|
|
|
|
### Memory Constraints
|
|
- `/proc/meminfo` → Available system memory
|
|
- `/sys/fs/cgroup/memory/memory.limit_in_bytes` → Container limits
|
|
|
|
### Project Context
|
|
- Any exposed project directory → Screenshot/video save location
|
|
- Directory permissions indicate write capabilities
|
|
|
|
## Example Scenarios
|
|
|
|
### Scenario 1: Desktop Development
|
|
```
|
|
Client exposes:
|
|
- file:///home/user/project-a
|
|
- file:///tmp/.X11-unix
|
|
|
|
Server detects:
|
|
- Project directory: /home/user/project-a
|
|
- Display: :0 (from X0 socket)
|
|
- Result: GUI browser on main display, files saved to project-a
|
|
```
|
|
|
|
### Scenario 2: Multi-Display Setup
|
|
```
|
|
Client exposes:
|
|
- file:///home/user/project-b
|
|
- file:///tmp/.X11-unix/X1
|
|
|
|
Server detects:
|
|
- Project directory: /home/user/project-b
|
|
- Display: :1 (from X1 socket)
|
|
- Result: GUI browser on secondary display, files saved to project-b
|
|
```
|
|
|
|
### Scenario 3: Headless Container
|
|
```
|
|
Client exposes:
|
|
- file:///workspace/project-c
|
|
- (no X11 sockets exposed)
|
|
|
|
Server detects:
|
|
- Project directory: /workspace/project-c
|
|
- No displays available
|
|
- Result: Headless browser, files saved to project-c
|
|
```
|
|
|
|
### Scenario 4: GPU-Accelerated
|
|
```
|
|
Client exposes:
|
|
- file:///home/user/project-d
|
|
- file:///tmp/.X11-unix
|
|
- file:///dev/dri
|
|
|
|
Server detects:
|
|
- Project directory: /home/user/project-d
|
|
- Display: :0
|
|
- GPU: Available (card0, renderD128)
|
|
- Result: GPU-accelerated browser with hardware rendering
|
|
```
|
|
|
|
## Questions and Considerations
|
|
|
|
### Protocol Compliance
|
|
- **Question:** Do all MCP clients support dynamic root updates?
|
|
- **Answer:** It's in the spec, most should support it
|
|
|
|
### Performance Impact
|
|
- **Question:** Cost of filesystem introspection on each root change?
|
|
- **Answer:** Minimal - just reading directory listings and small files
|
|
|
|
### Security Implications
|
|
- **Question:** What if client exposes sensitive system files?
|
|
- **Answer:** Server only reads specific known paths, validates access
|
|
|
|
### Fallback Behavior
|
|
- **Question:** What if expected roots aren't exposed?
|
|
- **Answer:** Graceful degradation to headless/default configuration
|
|
|
|
## Future Enhancements
|
|
|
|
### Extended System Detection
|
|
- Network interface detection via `/sys/class/net`
|
|
- Audio capabilities via `/proc/asound`
|
|
- Container detection via `/proc/1/cgroup`
|
|
|
|
### Resource Constraints
|
|
- CPU limits from cgroup files
|
|
- Memory limits for browser configuration
|
|
- Disk space checks for recording limits
|
|
|
|
### Multi-User Support
|
|
- User ID detection for proper file permissions
|
|
- Group membership for device access
|
|
- Home directory discovery
|
|
|
|
## Conclusion
|
|
|
|
This architecture successfully addresses multi-client workspace isolation by:
|
|
|
|
1. **Leveraging existing MCP infrastructure** (roots) rather than building new complexity
|
|
2. **Following UNIX philosophy** by exposing real system files that define environment
|
|
3. **Enabling dynamic workspace switching** through standard MCP protocol mechanisms
|
|
4. **Self-teaching through tool descriptions** so clients learn what to expose
|
|
5. **Maintaining strong isolation** while eliminating configuration overhead
|
|
|
|
The result is workspace-aware browser automation that feels magical but is built on solid, standard protocols and UNIX principles. |