playwright-mcp/MODEL-COLLABORATION-API.md
Ryan Malloy 6120506e91
Some checks failed
CI / test (ubuntu-latest) (push) Has been cancelled
CI / test (windows-latest) (push) Has been cancelled
CI / test_docker (push) Has been cancelled
CI / lint (push) Has been cancelled
CI / test (macos-latest) (push) Has been cancelled
feat: comprehensive MCP client debug enhancements and voice collaboration
Adds revolutionary features for MCP client identification and browser automation:

MCP Client Debug System:
- Floating pill toolbar with client identification and session info
- Theme system with 5 built-in themes (minimal, corporate, hacker, glass, high-contrast)
- Custom theme creation API with CSS variable overrides
- Cross-site validation ensuring toolbar persists across navigation
- Session-based injection with persistence across page loads

Voice Collaboration (Prototype):
- Web Speech API integration for conversational browser automation
- Bidirectional voice communication between AI and user
- Real-time voice guidance during automation tasks
- Documented architecture and future development roadmap

Code Injection Enhancements:
- Model collaboration API for notify, prompt, and inspector functions
- Auto-injection and persistence options
- Toolbar integration with code injection system

Documentation:
- Comprehensive technical achievement documentation
- Voice collaboration architecture and implementation guide
- Theme system integration documentation
- Tool annotation templates for consistency

This represents a major advancement in browser automation UX, enabling
unprecedented visibility and interaction patterns for MCP clients.
2025-11-14 21:36:08 -07:00

7.1 KiB

MCP Model-User Collaboration API

This document describes the JavaScript functions available to models for direct user communication and collaborative element selection within the Playwright MCP browser automation system.

🎯 Core Philosophy

Enable seamless collaboration between AI models and human users by providing simple JavaScript APIs for real-time communication, confirmations, and interactive element selection.

📱 Messaging System

Basic Messaging

// Send messages to users with auto-dismiss
mcpMessage('Hello user!', 'info', 5000)        // Info message (green)
mcpMessage('Success!', 'success', 3000)        // Success message (bright green) 
mcpMessage('Warning!', 'warning', 4000)        // Warning message (yellow)
mcpMessage('Error occurred', 'error', 6000)    // Error message (red)
mcpMessage('Persistent', 'info', 0)            // Persistent until dismissed

Helper Functions

mcpNotify.info('Information for the user')     // Standard info message
mcpNotify.success('Task completed!')           // Success confirmation
mcpNotify.warning('Please be careful')         // Cautionary message
mcpNotify.error('Something went wrong')        // Error notification
mcpNotify.loading('Processing...')             // Persistent loading indicator
mcpNotify.done('All finished!')               // Quick success (3s auto-dismiss)
mcpNotify.failed('Task failed')               // Quick error (5s auto-dismiss)

🤝 User Confirmation System

Interactive Prompts

// Ask user for confirmation
const confirmed = await mcpPrompt('Should I proceed with this action?');
if (confirmed) {
    mcpNotify.success('User confirmed - proceeding!');
} else {
    mcpNotify.info('User cancelled the action');
}

// Custom confirmation with options
const result = await mcpPrompt('Do you want to login first?', {
    title: '🔐 LOGIN REQUIRED',
    confirmText: 'YES, LOGIN',
    cancelText: 'SKIP FOR NOW'
});

🔍 Collaborative Element Selection

Interactive Element Inspector

// Basic element selection
mcpInspector.start('Please click on the login button');

// Element selection with callback
mcpInspector.start(
    'Click on the element you want me to interact with',
    (elementDetails) => {
        // Model receives detailed element information
        console.log('User selected:', elementDetails);
        
        // Use the XPath for precise automation
        const xpath = elementDetails.xpath;
        mcpNotify.success(`Got it! I'll click on: ${elementDetails.textContent}`);
        
        // Now use xpath with Playwright tools...
    }
);

// Stop inspection programmatically
mcpInspector.stop();

Element Details Returned

When user clicks an element, the callback receives:

{
    tagName: 'a',                    // HTML tag
    id: 'login-button',              // Element ID (if present)
    className: 'btn btn-primary',    // CSS classes
    textContent: 'Login',            // Visible text (truncated to 100 chars)
    xpath: '//*[@id="login-button"]', // Generated XPath
    attributes: {                    // All HTML attributes
        href: '/login',
        class: 'btn btn-primary',
        'data-action': 'login'
    },
    boundingRect: {                  // Element position/size
        x: 100, y: 200, 
        width: 80, height: 32
    },
    visible: true                    // Element visibility status
}

🚀 Collaboration Patterns

1. Ambiguous Element Selection

// When multiple similar elements exist
const confirmed = await mcpPrompt('I see multiple login buttons. Should I click the main one in the header?');
if (!confirmed) {
    mcpInspector.start('Please click on the specific login button you want me to use');
}

2. Permission Requests

// Ask before sensitive actions
const canProceed = await mcpPrompt('This will delete all items. Are you sure?', {
    title: '⚠️ DESTRUCTIVE ACTION',
    confirmText: 'YES, DELETE ALL',
    cancelText: 'CANCEL'
});

3. Form Field Identification

// Help user identify form fields
mcpInspector.start(
    'Please click on the email input field',
    (element) => {
        if (element.tagName !== 'input') {
            mcpNotify.warning('That doesn\'t look like an input field. Try again?');
            return;
        }
        mcpNotify.success('Perfect! I\'ll enter the email there.');
    }
);

4. Dynamic Content Handling

// When content changes dynamically
mcpNotify.loading('Waiting for page to load...');
// ... wait for content ...
mcpNotify.done('Page loaded!');

const shouldWait = await mcpPrompt('The content is still loading. Should I wait longer?');

🎨 Visual Design

All messages and prompts use the cyberpunk "hacker matrix" theme:

  • Black background with neon green text (#00ff00)
  • Terminal-style Courier New font
  • Glowing effects and smooth animations
  • High contrast for excellent readability
  • ESC key support for cancellation

🛠️ Implementation Guidelines for Models

Best Practices

  1. Clear Communication: Use descriptive messages that explain what you're doing
  2. Ask for Permission: Confirm before destructive or sensitive actions
  3. Collaborative Selection: When element location is ambiguous, ask user to click
  4. Progress Updates: Use loading/done messages for long operations
  5. Error Handling: Provide clear error messages with next steps

Example Workflows

// Complete login workflow with collaboration
async function collaborativeLogin() {
    // 1. Ask for permission
    const shouldLogin = await mcpPrompt('I need to log in. Should I proceed?');
    if (!shouldLogin) return;
    
    // 2. Get user to identify elements
    mcpNotify.loading('Please help me find the login form...');
    
    mcpInspector.start('Click on the username/email field', (emailField) => {
        mcpNotify.success('Got the email field!');
        
        mcpInspector.start('Now click on the password field', (passwordField) => {
            mcpNotify.success('Got the password field!');
            
            mcpInspector.start('Finally, click the login button', (loginButton) => {
                mcpNotify.done('Perfect! I have all the elements I need.');
                
                // Now use the XPaths for automation
                performLogin(emailField.xpath, passwordField.xpath, loginButton.xpath);
            });
        });
    });
}

🔧 Technical Notes

Initialization

These functions are automatically available after injecting the collaboration system:

// Check if available
if (typeof mcpMessage === 'function') {
    mcpNotify.success('Collaboration system ready!');
}

Error Handling

All functions include built-in error handling and will gracefully fail if DOM manipulation isn't possible.

Performance

  • Messages auto-clean up after display
  • Event listeners are properly removed
  • No memory leaks from repeated usage

This collaboration API transforms the MCP browser automation from a purely programmatic tool into an interactive, user-guided system that combines AI efficiency with human insight and precision.