Solves the "ships passing in the night" problem where agents publish messages before other agents have subscribed, causing dropped messages. New tools: - mqtt_host_conversation: Initiating agent hosts and waits for joiners - mqtt_join_conversation: Joining agents connect and signal ready - mqtt_conversation_status: Check conversation state - mqtt_list_conversations: List active conversations The protocol guarantees no messages are dropped by ensuring all expected agents are subscribed before the host begins publishing.
7.3 KiB
| title | description |
|---|---|
| Agent-to-Agent Handshake Protocol | Solving the 'ships passing in the night' problem with coordinated agent communication |
The Problem: Dropped Messages
When two AI agents try to communicate via MQTT, a common failure pattern emerges:
- Agent A connects to broker
- Agent A publishes "Hello, Agent B!"
- Agent B connects to broker
- Agent B subscribes to the topic
- Agent B never receives the message (it was published before subscription)
This is the "ships passing in the night" problem - agents miss messages because they publish before the other party has subscribed.
The Solution: Host/Join Handshake
mcmqtt's coordination protocol ensures no messages are dropped by implementing a simple but powerful handshake:
HOST JOINER
│ │
│ 1. Spawn/connect to broker │
│ 2. Subscribe to $coord/join │
│ 3. Publish broker_ready │
│─────────────────────────────────►│
│ │
│ 4. Connect to broker │
│ 5. Subscribe to topics │
│ 6. Publish join request │
│◄─────────────────────────────────│
│ │
│ 7. Acknowledge join │
│─────────────────────────────────►│
│ │
│ 8. Publish all_ready signal │
│─────────────────────────────────►│
│ │
▼ SAFE TO EXCHANGE MESSAGES! ▼
Key Principle
The initiating agent ALWAYS hosts. This eliminates confusion about who spawns the broker.
MCP Tools
mqtt_host_conversation
Use this when you are starting a conversation with other agents.
{
"tool": "mqtt_host_conversation",
"arguments": {
"session_id": "collab-task-123",
"host_agent_id": "coordinator-agent",
"expected_agents": ["worker-1", "worker-2", "analyst"],
"broker_host": "127.0.0.1",
"broker_port": 0,
"timeout_seconds": 30
}
}
Parameters:
| Parameter | Type | Description |
|---|---|---|
session_id |
string | Unique identifier for this conversation |
host_agent_id |
string | Your agent's unique ID |
expected_agents |
array | List of agent IDs that must join |
broker_host |
string | Host to bind broker (default: 127.0.0.1) |
broker_port |
int | Port (0 = auto-assign) |
timeout_seconds |
float | Max wait time for agents to join |
Response (success):
{
"success": true,
"message": "Conversation ready! All 3 agents joined.",
"session_id": "collab-task-123",
"state": "ready",
"broker_host": "127.0.0.1",
"broker_port": 51234,
"broker_url": "mqtt://127.0.0.1:51234",
"joined_agents": ["worker-1", "worker-2", "analyst"],
"conversation_topic": "conversation/collab-task-123/main",
"ready_to_publish": true
}
mqtt_join_conversation
Use this when another agent invited you to a conversation.
{
"tool": "mqtt_join_conversation",
"arguments": {
"session_id": "collab-task-123",
"agent_id": "worker-1",
"broker_host": "127.0.0.1",
"broker_port": 51234,
"capabilities": ["data-analysis", "visualization"],
"timeout_seconds": 30
}
}
Parameters:
| Parameter | Type | Description |
|---|---|---|
session_id |
string | Session ID from host's invitation |
agent_id |
string | Your unique agent ID |
broker_host |
string | Broker host (from host's invitation) |
broker_port |
int | Broker port (from host's invitation) |
capabilities |
array | Optional list of your capabilities |
timeout_seconds |
float | Max wait for acknowledgement |
Response (success):
{
"success": true,
"message": "Successfully joined conversation collab-task-123!",
"session_id": "collab-task-123",
"agent_id": "worker-1",
"broker_host": "127.0.0.1",
"broker_port": 51234,
"other_agents": ["coordinator-agent", "worker-2", "analyst"],
"conversation_topic": "conversation/collab-task-123/main",
"ready_to_receive": true
}
Example: Two-Agent Collaboration
Agent A (Initiator/Host)
# Step 1: Host the conversation
result = await mqtt_host_conversation(
session_id="data-analysis-job",
host_agent_id="data-processor",
expected_agents=["visualizer"],
timeout_seconds=30
)
if result["ready_to_publish"]:
# Step 2: Safe to publish - visualizer is definitely subscribed!
await mqtt_publish(
topic=result["conversation_topic"],
payload={"type": "data", "values": [1, 2, 3, 4, 5]}
)
Agent B (Joiner)
# Step 1: Join using info from Agent A
result = await mqtt_join_conversation(
session_id="data-analysis-job",
agent_id="visualizer",
broker_host="127.0.0.1",
broker_port=51234
)
if result["ready_to_receive"]:
# Step 2: Now receive messages - guaranteed not to miss any!
messages = await mqtt_get_messages(
topic=result["conversation_topic"]
)
Topic Structure
The protocol uses reserved topics under $coordination/:
$coordination/{session_id}/
├── broker_ready # Host publishes broker info (retained)
├── join # Agents publish join requests
├── joined/{agent_id} # Host acknowledges each agent (retained)
├── ready # Host signals all agents ready (retained)
└── heartbeat/{agent_id} # Optional: agent heartbeats
After handshake, conversations use:
conversation/{session_id}/
├── main # Primary conversation channel
├── {channel_name} # Additional named channels
└── ...
Timeout Handling
If expected agents don't join within the timeout:
{
"success": false,
"message": "Timeout waiting for agents. Missing: ['worker-2']",
"session_id": "collab-task-123",
"state": "timeout",
"joined_agents": ["worker-1", "analyst"],
"missing_agents": ["worker-2"],
"ready_to_publish": false
}
The host can then decide whether to:
- Retry with a longer timeout
- Proceed with available agents
- Abort the conversation
Best Practices
-
Always use coordination tools for multi-agent work - Don't use raw
mqtt_connect+mqtt_publishwhen coordinating with other agents -
Choose meaningful session IDs - Include context like
task-{id}-{timestamp}for debugging -
Set appropriate timeouts - Network latency and agent startup time vary; 30 seconds is a safe default
-
Check the response - Always verify
ready_to_publish(host) orready_to_receive(joiner) before proceeding -
Handle failures gracefully - Timeout doesn't mean failure; retry logic is your friend