Ryan Malloy 33189816f2 Add agent-to-agent coordination protocol with host/join handshake
Solves the "ships passing in the night" problem where agents publish
messages before other agents have subscribed, causing dropped messages.

New tools:
- mqtt_host_conversation: Initiating agent hosts and waits for joiners
- mqtt_join_conversation: Joining agents connect and signal ready
- mqtt_conversation_status: Check conversation state
- mqtt_list_conversations: List active conversations

The protocol guarantees no messages are dropped by ensuring all expected
agents are subscribed before the host begins publishing.
2026-02-07 04:40:30 -07:00

7.3 KiB

title description
Agent-to-Agent Handshake Protocol Solving the 'ships passing in the night' problem with coordinated agent communication

The Problem: Dropped Messages

When two AI agents try to communicate via MQTT, a common failure pattern emerges:

  1. Agent A connects to broker
  2. Agent A publishes "Hello, Agent B!"
  3. Agent B connects to broker
  4. Agent B subscribes to the topic
  5. Agent B never receives the message (it was published before subscription)

This is the "ships passing in the night" problem - agents miss messages because they publish before the other party has subscribed.

The Solution: Host/Join Handshake

mcmqtt's coordination protocol ensures no messages are dropped by implementing a simple but powerful handshake:

                HOST                              JOINER
                  │                                  │
                  │  1. Spawn/connect to broker      │
                  │  2. Subscribe to $coord/join     │
                  │  3. Publish broker_ready         │
                  │─────────────────────────────────►│
                  │                                  │
                  │          4. Connect to broker    │
                  │          5. Subscribe to topics  │
                  │          6. Publish join request │
                  │◄─────────────────────────────────│
                  │                                  │
                  │  7. Acknowledge join             │
                  │─────────────────────────────────►│
                  │                                  │
                  │  8. Publish all_ready signal     │
                  │─────────────────────────────────►│
                  │                                  │
                  ▼  SAFE TO EXCHANGE MESSAGES!      ▼

Key Principle

The initiating agent ALWAYS hosts. This eliminates confusion about who spawns the broker.

MCP Tools

mqtt_host_conversation

Use this when you are starting a conversation with other agents.

{
  "tool": "mqtt_host_conversation",
  "arguments": {
    "session_id": "collab-task-123",
    "host_agent_id": "coordinator-agent",
    "expected_agents": ["worker-1", "worker-2", "analyst"],
    "broker_host": "127.0.0.1",
    "broker_port": 0,
    "timeout_seconds": 30
  }
}

Parameters:

Parameter Type Description
session_id string Unique identifier for this conversation
host_agent_id string Your agent's unique ID
expected_agents array List of agent IDs that must join
broker_host string Host to bind broker (default: 127.0.0.1)
broker_port int Port (0 = auto-assign)
timeout_seconds float Max wait time for agents to join

Response (success):

{
  "success": true,
  "message": "Conversation ready! All 3 agents joined.",
  "session_id": "collab-task-123",
  "state": "ready",
  "broker_host": "127.0.0.1",
  "broker_port": 51234,
  "broker_url": "mqtt://127.0.0.1:51234",
  "joined_agents": ["worker-1", "worker-2", "analyst"],
  "conversation_topic": "conversation/collab-task-123/main",
  "ready_to_publish": true
}

mqtt_join_conversation

Use this when another agent invited you to a conversation.

{
  "tool": "mqtt_join_conversation",
  "arguments": {
    "session_id": "collab-task-123",
    "agent_id": "worker-1",
    "broker_host": "127.0.0.1",
    "broker_port": 51234,
    "capabilities": ["data-analysis", "visualization"],
    "timeout_seconds": 30
  }
}

Parameters:

Parameter Type Description
session_id string Session ID from host's invitation
agent_id string Your unique agent ID
broker_host string Broker host (from host's invitation)
broker_port int Broker port (from host's invitation)
capabilities array Optional list of your capabilities
timeout_seconds float Max wait for acknowledgement

Response (success):

{
  "success": true,
  "message": "Successfully joined conversation collab-task-123!",
  "session_id": "collab-task-123",
  "agent_id": "worker-1",
  "broker_host": "127.0.0.1",
  "broker_port": 51234,
  "other_agents": ["coordinator-agent", "worker-2", "analyst"],
  "conversation_topic": "conversation/collab-task-123/main",
  "ready_to_receive": true
}

Example: Two-Agent Collaboration

Agent A (Initiator/Host)

# Step 1: Host the conversation
result = await mqtt_host_conversation(
    session_id="data-analysis-job",
    host_agent_id="data-processor",
    expected_agents=["visualizer"],
    timeout_seconds=30
)

if result["ready_to_publish"]:
    # Step 2: Safe to publish - visualizer is definitely subscribed!
    await mqtt_publish(
        topic=result["conversation_topic"],
        payload={"type": "data", "values": [1, 2, 3, 4, 5]}
    )

Agent B (Joiner)

# Step 1: Join using info from Agent A
result = await mqtt_join_conversation(
    session_id="data-analysis-job",
    agent_id="visualizer",
    broker_host="127.0.0.1",
    broker_port=51234
)

if result["ready_to_receive"]:
    # Step 2: Now receive messages - guaranteed not to miss any!
    messages = await mqtt_get_messages(
        topic=result["conversation_topic"]
    )

Topic Structure

The protocol uses reserved topics under $coordination/:

$coordination/{session_id}/
├── broker_ready          # Host publishes broker info (retained)
├── join                  # Agents publish join requests
├── joined/{agent_id}     # Host acknowledges each agent (retained)
├── ready                 # Host signals all agents ready (retained)
└── heartbeat/{agent_id}  # Optional: agent heartbeats

After handshake, conversations use:

conversation/{session_id}/
├── main                  # Primary conversation channel
├── {channel_name}        # Additional named channels
└── ...

Timeout Handling

If expected agents don't join within the timeout:

{
  "success": false,
  "message": "Timeout waiting for agents. Missing: ['worker-2']",
  "session_id": "collab-task-123",
  "state": "timeout",
  "joined_agents": ["worker-1", "analyst"],
  "missing_agents": ["worker-2"],
  "ready_to_publish": false
}

The host can then decide whether to:

  • Retry with a longer timeout
  • Proceed with available agents
  • Abort the conversation

Best Practices

  1. Always use coordination tools for multi-agent work - Don't use raw mqtt_connect + mqtt_publish when coordinating with other agents

  2. Choose meaningful session IDs - Include context like task-{id}-{timestamp} for debugging

  3. Set appropriate timeouts - Network latency and agent startup time vary; 30 seconds is a safe default

  4. Check the response - Always verify ready_to_publish (host) or ready_to_receive (joiner) before proceeding

  5. Handle failures gracefully - Timeout doesn't mean failure; retry logic is your friend