playwright-mcp/MODEL-COLLABORATION-API.md
Ryan Malloy 6120506e91
Some checks failed
CI / test (ubuntu-latest) (push) Has been cancelled
CI / test (windows-latest) (push) Has been cancelled
CI / test_docker (push) Has been cancelled
CI / lint (push) Has been cancelled
CI / test (macos-latest) (push) Has been cancelled
feat: comprehensive MCP client debug enhancements and voice collaboration
Adds revolutionary features for MCP client identification and browser automation:

MCP Client Debug System:
- Floating pill toolbar with client identification and session info
- Theme system with 5 built-in themes (minimal, corporate, hacker, glass, high-contrast)
- Custom theme creation API with CSS variable overrides
- Cross-site validation ensuring toolbar persists across navigation
- Session-based injection with persistence across page loads

Voice Collaboration (Prototype):
- Web Speech API integration for conversational browser automation
- Bidirectional voice communication between AI and user
- Real-time voice guidance during automation tasks
- Documented architecture and future development roadmap

Code Injection Enhancements:
- Model collaboration API for notify, prompt, and inspector functions
- Auto-injection and persistence options
- Toolbar integration with code injection system

Documentation:
- Comprehensive technical achievement documentation
- Voice collaboration architecture and implementation guide
- Theme system integration documentation
- Tool annotation templates for consistency

This represents a major advancement in browser automation UX, enabling
unprecedented visibility and interaction patterns for MCP clients.
2025-11-14 21:36:08 -07:00

209 lines
7.1 KiB
Markdown

# MCP Model-User Collaboration API
This document describes the JavaScript functions available to models for direct user communication and collaborative element selection within the Playwright MCP browser automation system.
## 🎯 Core Philosophy
Enable seamless collaboration between AI models and human users by providing simple JavaScript APIs for real-time communication, confirmations, and interactive element selection.
## 📱 Messaging System
### Basic Messaging
```javascript
// Send messages to users with auto-dismiss
mcpMessage('Hello user!', 'info', 5000) // Info message (green)
mcpMessage('Success!', 'success', 3000) // Success message (bright green)
mcpMessage('Warning!', 'warning', 4000) // Warning message (yellow)
mcpMessage('Error occurred', 'error', 6000) // Error message (red)
mcpMessage('Persistent', 'info', 0) // Persistent until dismissed
```
### Helper Functions
```javascript
mcpNotify.info('Information for the user') // Standard info message
mcpNotify.success('Task completed!') // Success confirmation
mcpNotify.warning('Please be careful') // Cautionary message
mcpNotify.error('Something went wrong') // Error notification
mcpNotify.loading('Processing...') // Persistent loading indicator
mcpNotify.done('All finished!') // Quick success (3s auto-dismiss)
mcpNotify.failed('Task failed') // Quick error (5s auto-dismiss)
```
## 🤝 User Confirmation System
### Interactive Prompts
```javascript
// Ask user for confirmation
const confirmed = await mcpPrompt('Should I proceed with this action?');
if (confirmed) {
mcpNotify.success('User confirmed - proceeding!');
} else {
mcpNotify.info('User cancelled the action');
}
// Custom confirmation with options
const result = await mcpPrompt('Do you want to login first?', {
title: '🔐 LOGIN REQUIRED',
confirmText: 'YES, LOGIN',
cancelText: 'SKIP FOR NOW'
});
```
## 🔍 Collaborative Element Selection
### Interactive Element Inspector
```javascript
// Basic element selection
mcpInspector.start('Please click on the login button');
// Element selection with callback
mcpInspector.start(
'Click on the element you want me to interact with',
(elementDetails) => {
// Model receives detailed element information
console.log('User selected:', elementDetails);
// Use the XPath for precise automation
const xpath = elementDetails.xpath;
mcpNotify.success(`Got it! I'll click on: ${elementDetails.textContent}`);
// Now use xpath with Playwright tools...
}
);
// Stop inspection programmatically
mcpInspector.stop();
```
### Element Details Returned
When user clicks an element, the callback receives:
```javascript
{
tagName: 'a', // HTML tag
id: 'login-button', // Element ID (if present)
className: 'btn btn-primary', // CSS classes
textContent: 'Login', // Visible text (truncated to 100 chars)
xpath: '//*[@id="login-button"]', // Generated XPath
attributes: { // All HTML attributes
href: '/login',
class: 'btn btn-primary',
'data-action': 'login'
},
boundingRect: { // Element position/size
x: 100, y: 200,
width: 80, height: 32
},
visible: true // Element visibility status
}
```
## 🚀 Collaboration Patterns
### 1. Ambiguous Element Selection
```javascript
// When multiple similar elements exist
const confirmed = await mcpPrompt('I see multiple login buttons. Should I click the main one in the header?');
if (!confirmed) {
mcpInspector.start('Please click on the specific login button you want me to use');
}
```
### 2. Permission Requests
```javascript
// Ask before sensitive actions
const canProceed = await mcpPrompt('This will delete all items. Are you sure?', {
title: '⚠️ DESTRUCTIVE ACTION',
confirmText: 'YES, DELETE ALL',
cancelText: 'CANCEL'
});
```
### 3. Form Field Identification
```javascript
// Help user identify form fields
mcpInspector.start(
'Please click on the email input field',
(element) => {
if (element.tagName !== 'input') {
mcpNotify.warning('That doesn\'t look like an input field. Try again?');
return;
}
mcpNotify.success('Perfect! I\'ll enter the email there.');
}
);
```
### 4. Dynamic Content Handling
```javascript
// When content changes dynamically
mcpNotify.loading('Waiting for page to load...');
// ... wait for content ...
mcpNotify.done('Page loaded!');
const shouldWait = await mcpPrompt('The content is still loading. Should I wait longer?');
```
## 🎨 Visual Design
All messages and prompts use the cyberpunk "hacker matrix" theme:
- Black background with neon green text (#00ff00)
- Terminal-style Courier New font
- Glowing effects and smooth animations
- High contrast for excellent readability
- ESC key support for cancellation
## 🛠️ Implementation Guidelines for Models
### Best Practices
1. **Clear Communication**: Use descriptive messages that explain what you're doing
2. **Ask for Permission**: Confirm before destructive or sensitive actions
3. **Collaborative Selection**: When element location is ambiguous, ask user to click
4. **Progress Updates**: Use loading/done messages for long operations
5. **Error Handling**: Provide clear error messages with next steps
### Example Workflows
```javascript
// Complete login workflow with collaboration
async function collaborativeLogin() {
// 1. Ask for permission
const shouldLogin = await mcpPrompt('I need to log in. Should I proceed?');
if (!shouldLogin) return;
// 2. Get user to identify elements
mcpNotify.loading('Please help me find the login form...');
mcpInspector.start('Click on the username/email field', (emailField) => {
mcpNotify.success('Got the email field!');
mcpInspector.start('Now click on the password field', (passwordField) => {
mcpNotify.success('Got the password field!');
mcpInspector.start('Finally, click the login button', (loginButton) => {
mcpNotify.done('Perfect! I have all the elements I need.');
// Now use the XPaths for automation
performLogin(emailField.xpath, passwordField.xpath, loginButton.xpath);
});
});
});
}
```
## 🔧 Technical Notes
### Initialization
These functions are automatically available after injecting the collaboration system:
```javascript
// Check if available
if (typeof mcpMessage === 'function') {
mcpNotify.success('Collaboration system ready!');
}
```
### Error Handling
All functions include built-in error handling and will gracefully fail if DOM manipulation isn't possible.
### Performance
- Messages auto-clean up after display
- Event listeners are properly removed
- No memory leaks from repeated usage
This collaboration API transforms the MCP browser automation from a purely programmatic tool into an interactive, user-guided system that combines AI efficiency with human insight and precision.