mcbluetooth-esp32/docs/automated-e2e-testing.md
Ryan Malloy 88d006e9c4 Add automated E2E testing documentation and test prompts
- docs/automated-e2e-testing.md: Guide for running headless Claude CLI
  tests with both mcbluetooth and mcbluetooth-esp32 MCP servers
- tests/prompts/test-prompt-v4.md: 71-test suite covering Classic BT,
  BLE GATT, HCI capture, device management
- tests/prompts/test-prompt-v5.md: 76-test suite adding Battery Service
  (0x180F) and bt_ble_battery verification

Test results from v4: 71/71 PASS with 143 HCI packets captured
2026-02-03 11:18:37 -07:00

288 lines
8.2 KiB
Markdown

# Automated E2E Testing with Claude CLI
This document describes how to run fully automated end-to-end Bluetooth tests using the Claude CLI in headless mode. The tests exercise the complete Bluetooth stack across two devices: a Linux host running `mcbluetooth` (BlueZ) and an ESP32 running the `mcbluetooth-esp32` firmware.
## Architecture
```
┌───────────────────────────────────────────────────────────────────┐
│ Claude CLI (headless mode) │
│ Orchestrates both MCP servers │
└───────────────────────────┬───────────────────────────────────────┘
┌───────────────┴───────────────┐
│ │
┌───────┴───────┐ ┌───────┴───────┐
│ mcbluetooth │ │mcbluetooth-esp32│
│ MCP Server │ │ MCP Server │
│ (bt_* tools)│ │ (esp32_* tools)│
└───────┬───────┘ └───────┬────────┘
│ │
D-Bus/BlueZ Serial/UART
│ │
┌───────┴───────┐ ┌───────┴────────┐
│ Linux Host │◄── Bluetooth ──►│ ESP32 │
│ (hci1) │ (over air) │ (peripheral) │
└───────────────┘ └────────────────┘
```
## Prerequisites
### Hardware
- ESP32 dev board connected via USB (typically `/dev/ttyUSB0` or `/dev/ttyUSB4`)
- Linux host with Bluetooth adapter (typically `hci0` or `hci1`)
### Software
- ESP32 flashed with mcbluetooth-esp32 firmware
- Both MCP servers installed and accessible via `uvx`
- Claude CLI installed
### Permissions
For HCI packet capture tests, grant btmon the required capability:
```bash
sudo setcap cap_net_raw+ep /usr/bin/btmon
```
## Test Environment Setup
### 1. Create a test directory
```bash
mkdir -p /tmp/bt-e2e-test
cd /tmp/bt-e2e-test
```
### 2. Create MCP configuration
Create `.mcp.json` with both MCP servers:
```json
{
"mcpServers": {
"esp32": {
"type": "stdio",
"command": "uvx",
"args": ["mcbluetooth-esp32"],
"env": {
"ESP32_SERIAL_PORT": "/dev/ttyUSB4"
}
},
"bluez": {
"type": "stdio",
"command": "uvx",
"args": ["mcbluetooth"]
}
}
}
```
### 3. Initialize git (required for Claude CLI)
```bash
git init
```
## Running Tests
### Basic Command Structure
```bash
claude -p "$(cat test-prompt.md)" \
--mcp-config .mcp.json \
--allowedTools "mcp__esp32__*,mcp__bluez__*" \
--output-format json \
2>/dev/null | tee results.json | jq -r '.result'
```
**Key flags:**
- `-p`: Print/headless mode (non-interactive)
- `--mcp-config`: Path to MCP server configuration
- `--allowedTools`: Glob patterns for permitted tools (required in headless mode)
- `--output-format json`: Machine-parseable output
### Full Test Suite (76 tests)
The comprehensive test suite covers:
- ESP32 connection and system commands
- BlueZ adapter management
- Classic Bluetooth SSP pairing with auto-accept
- BLE GATT service creation (Environmental Sensing + Battery Service)
- HCI packet capture and analysis
- GATT read/write/notify operations
- Device management (trust, block, alias)
```bash
claude -p "$(cat test-prompt-v5.md)" \
--mcp-config .mcp.json \
--allowedTools "mcp__esp32__*,mcp__bluez__*" \
--output-format json 2>/dev/null | tee results-v5.json
```
### Analyzing Results
Extract the summary:
```bash
jq -r '.result' results-v5.json
```
Check pass/fail statistics:
```bash
jq -r '.result' results-v5.json | grep -E "(PASS|FAIL|Total)"
```
View full metrics:
```bash
jq '{
duration_ms: .duration_ms,
num_turns: .num_turns,
total_cost_usd: .total_cost_usd,
success: .is_error == false
}' results-v5.json
```
## Test Phases
The test suite is organized into phases that must run sequentially:
| Phase | Tests | Coverage |
|-------|-------|----------|
| 1. ESP32 Connection | 1-4 | connect, ping, get_info, status |
| 2. BlueZ Adapter | 5-8 | list_adapters, adapter_info, pairable, discoverable |
| 3. Classic BT + SSP | 9-24 | enable, configure, SSP mode, scan, pair, device management |
| 4. Classic Cleanup | 25-29 | disable, events, clear_events |
| 5. BLE GATT Setup | 30-42 | Battery Service, Environmental Sensing, advertising |
| 6. HCI Capture + Discovery | 43-51 | capture_start, BLE scan, connect, services, characteristics |
| 7. Analyze Capture | 52-55 | capture_stop, parse, analyze, read_raw |
| 8. GATT Write + Notify | 56-63 | write, subscribe, notify, unsubscribe |
| 9. BLE Cleanup | 64-68 | stop advertising, clear GATT, disable BLE |
| 10. Adapter Management | 69-73 | set_alias, restore, disable discoverable |
| 11. Final Cleanup | 74-76 | ESP32 reset, disconnect, final check |
## SSP Pairing: The auto_accept Flag
Numeric Comparison SSP requires **both sides** to confirm the passkey. In headless mode, this creates a deadlock:
1. Linux calls `bt_pair()` which blocks waiting for ESP32 confirmation
2. ESP32 can't receive the confirmation command because the LLM is blocked
**Solution:** The ESP32 firmware supports `auto_accept` mode:
```
esp32_set_ssp_mode(mode="numeric_comparison", auto_accept=true)
```
This makes the ESP32 automatically confirm SSP pairings, breaking the deadlock.
## Battery Service Test
The test suite creates a standard Battery Service (UUID 0x180F) on the ESP32:
1. Add Battery Service as primary GATT service
2. Add Battery Level characteristic (UUID 0x2A19) with read property
3. Set value to "4b" (75% in hex)
4. After BLE connection, call `bt_ble_battery` on Linux
5. Verify it returns 75
This tests the dedicated `bt_ble_battery` tool in mcbluetooth which reads from the standard Battery Level characteristic.
## HCI Packet Capture
Tests 43-55 exercise the btsnoop capture functionality:
```
bt_capture_start(adapter="hci1", output_file="/tmp/ble-gatt-capture.btsnoop")
# ... BLE operations ...
bt_capture_stop(capture_id="...")
bt_capture_parse(filepath="...", max_packets=50)
bt_capture_analyze(filepath="...")
bt_capture_read_raw(filepath="...", count=20)
```
Typical captures include 100-150 packets covering:
- HCI commands (LE scanning, connection)
- ACL data (GATT operations)
- HCI events (connection complete, encryption)
## Test Prompt Format
Test prompts follow a structured format:
```markdown
# Test Suite Title
## Phase N: Phase Name (Tests X-Y)
N. **Test Name**: Call `tool_name` with params — expected result
## Summary
After all tests, print a DETAILED summary table:
| # | Test | Result | Notes |
|---|------|--------|-------|
| 1 | Connect | PASS/FAIL | ... |
```
## Troubleshooting
### Serial port busy
```
Error: could not open port /dev/ttyUSB4
```
Check for other processes using the port:
```bash
lsof /dev/ttyUSB4
```
### btmon permission denied
```
Error: Failed to open HCI raw socket
```
Grant capability:
```bash
sudo setcap cap_net_raw+ep /usr/bin/btmon
```
### ESP32 not responding
Power cycle the ESP32 and check the firmware is flashed:
```bash
# Monitor serial output
screen /dev/ttyUSB4 115200
```
Press reset button — should see boot event JSON.
### Pairing timeout
Ensure `auto_accept=true` is set for SSP numeric comparison mode before initiating pairing from Linux.
## Example Results
A successful v5 run produces:
```json
{
"type": "result",
"subtype": "success",
"is_error": false,
"duration_ms": 320000,
"num_turns": 88,
"result": "All 76 tests passed..."
}
```
Key metrics from successful runs:
- Duration: ~5-6 minutes
- API turns: 80-90
- HCI packets captured: 100-150
- Cost: ~$1.50-1.70 USD