skywalker-1/site/src/content/docs/i2c/stop-corruption-bug.mdx
Ryan Malloy bbdcb243dc Normalize line endings to LF across entire repository
Apply .gitattributes normalization to convert all CRLF line
endings inherited from Windows-origin source files to Unix LF.
175 files, zero content changes.
2026-02-20 10:55:50 -07:00

168 lines
7.5 KiB
Plaintext

---
title: I2C STOP Corruption Bug
description: Root cause analysis of the spurious I2C STOP condition that corrupted the FX2 controller state during boot.
---
import { Steps, Aside, Badge } from '@astrojs/starlight/components';
During development of the custom firmware v3.01.0, the BOOT_8PSK (0x89) command caused the FX2 to hang for over 10 seconds, making the USB device completely unresponsive. The root cause was traced to a single line of code: a spurious I2C STOP condition issued when no transaction was active.
<Aside type="danger" title="Critical Hardware Bug">
Sending `I2CS |= bmSTOP` when no I2C transaction is active (no prior START, bus idle) corrupts the FX2 I2C controller's internal state machine. The bmSTOP bit may not self-clear, and subsequent START conditions fail to detect ACK from slaves. The Cypress TRM does not explicitly warn against this.
</Aside>
## The Problem
The boot function originally included a "bus reset" step before any I2C communication:
```c title="Broken Code"
I2CS |= bmSTOP;
i2c_wait_stop();
```
This pattern appears in various FX2 example code and seems reasonable -- send a STOP to ensure the I2C bus is in a known idle state before starting fresh. On the FX2's I2C controller hardware, this is incorrect.
## Root Cause Analysis
The root cause was discovered through a series of incremental debug modes added to the BOOT_8PSK handler. Each mode executes a subset of the full boot sequence, isolating which step introduces the failure.
### Debug Mode Results
| wValue | Action | Result | Key Observation |
|--------|--------|--------|-----------------|
| 0x80 | No-op: return status only | <Badge text="Works" variant="success" /> | Baseline |
| 0x81 | GPIO + power + delays (no I2C) | <Badge text="Works" variant="success" /> | Power sequencing is correct |
| 0x82 | GPIO + power + bmSTOP + I2C probe | <Badge text="Fails" variant="danger" /> | bmSTOP corrupts I2C |
| 0x83 | GPIO + power + bmSTOP + probe + init | <Badge text="Fails" variant="danger" /> | Same root cause |
| 0x84 | I2C probe only (chip already powered) | <Badge text="Works" variant="success" /> | BCM4500 is alive |
| 0x85 | GPIO + power + probe (**no bmSTOP**) | <Badge text="Works" variant="success" /> | Confirms bmSTOP is the cause |
### Three Key Observations
1. **Mode 0x82 fails but mode 0x85 succeeds.** These modes are identical except that 0x82 issues `I2CS |= bmSTOP` before the probe and 0x85 does not. The bmSTOP is the only difference.
2. **Mode 0x84 succeeds immediately after 0x82 fails.** Mode 0x84 performs a plain I2C combined read with no GPIO manipulation or bus reset. If called after a failed 0x82, it succeeds. This proves the BCM4500 was alive and responding -- the FX2 I2C controller was in a bad state, not the bus or the slave.
3. **Raw I2C reads via command 0xB5 succeed after 0x82 fails.** Command 0xB5 uses the same `i2c_combined_read` function. Running it from the host after a failed 0x82 returns valid data from the BCM4500.
## What Happens Inside the FX2
The FX2's I2C master controller is a hardware peripheral accessed through the I2CS, I2DAT, and I2CTL SFRs. The controller implements an I2C state machine in silicon. Writing bmSTOP to I2CS instructs the hardware to generate a STOP condition (SDA rising while SCL is high).
When no I2C transaction is active -- no prior START has been issued, and the bus is idle -- writing bmSTOP puts the controller into an inconsistent internal state:
- The bmSTOP bit may not clear properly (it is supposed to self-clear when the STOP condition completes)
- Subsequent START conditions fail to generate proper clock sequences
- ACK detection from slaves becomes unreliable
The Cypress TRM describes STOP as a step that follows a completed read or write transaction. It is not documented as a standalone bus-reset mechanism.
## The Fix
The fix is a single deletion. Remove the spurious STOP from the boot sequence:
```c title="Before (Broken)"
/* "Reset" I2C bus */
I2CS |= bmSTOP;
i2c_wait_stop();
```
```c title="After (Correct)"
/* NOTE: Do NOT send I2CS bmSTOP here. Sending STOP when no
* transaction is active corrupts the FX2 I2C controller state,
* causing subsequent START+ACK detection to fail. The I2C bus
* will be in a clean state when we reach the probe step --
* any prior transaction ended with STOP. */
```
The correct approach is to simply proceed with a new START condition. If the bus is idle (after power-on or after the previous transaction completed normally), the START succeeds and the controller enters its normal operating state. The hardware handles bus arbitration automatically.
## Corrected Boot Sequence
```c title="bcm4500_boot() -- Corrected"
static BOOL bcm4500_boot(void) {
boot_stage = 1;
cancel_i2c_trans = FALSE;
/* P3.7, P3.6, P3.5 HIGH (idle state for control lines) */
IOD |= 0xE0;
/* Assert BCM4500 hardware RESET (P0.5 LOW) */
OEA |= PIN_BCM_RESET;
IOA &= ~PIN_BCM_RESET;
/* No I2CS bmSTOP here -- see note above */
/* Power on: P0.1 HIGH (enable), P0.2 LOW (disable off) */
OEA |= (PIN_PWR_EN | PIN_PWR_DIS);
IOA = (IOA & ~PIN_PWR_DIS) | PIN_PWR_EN;
boot_stage = 2;
delay(30); /* power settle */
IOA |= PIN_BCM_RESET; /* release reset */
delay(50); /* BCM4500 POR + mask ROM boot */
boot_stage = 3;
/* I2C probe -- if this fails, the chip didn't respond */
if (!bcm_direct_read(BCM_REG_STATUS, &i2c_rd[0]))
return FALSE;
/* ... register init blocks follow ... */
}
```
## Boot Results After Fix
| Metric | Value |
|--------|-------|
| Boot time | ~90 ms total |
| config_status | 0x03 (STARTED + FW_LOADED) |
| boot_stage | 0xFF (COMPLETE) |
| Direct registers 0xA2-0xA8 | All return 0x02 (powered, not locked) |
| Signal lock | 0x00 (no lock -- dish not aimed) |
| USB responsiveness | No hang; fully responsive throughout |
## Test Scripts
The investigation was driven by a series of test scripts in the `tools/` directory:
| Script | Purpose |
|--------|---------|
| `test_boot_debug.py` | Sends debug modes 0x80--0x83 sequentially |
| `test_i2c_debug.py` | Powers on via 0x81, runs bus scans, tests probe timing |
| `test_i2c_isolate.py` | Tests whether re-reset or insufficient delay causes failure |
| `test_i2c_pinpoint.py` | The definitive test: compares modes 0x84, 0x85, and 0x82 |
## Timeout Protection
Even with the bmSTOP fix, timeout protection on all I2C operations is essential. The FX2's I2C controller has no hardware timeout -- if a slave holds SCL low (clock stretching) or a fault prevents bmDONE from asserting, the firmware spins forever.
The custom firmware replaces all fx2lib I2C functions with timeout-protected wrappers:
```c title="Timeout-Protected I2C Waits"
#define I2C_TIMEOUT 6000
static BOOL i2c_wait_done(void) {
WORD timeout = I2C_TIMEOUT;
while (!(I2CS & bmDONE)) {
if (--timeout == 0) return FALSE;
}
return TRUE;
}
static BOOL i2c_wait_stop(void) {
WORD timeout = I2C_TIMEOUT;
while (I2CS & bmSTOP) {
if (--timeout == 0) return FALSE;
}
return TRUE;
}
```
A WORD counter of 6000 at 48 MHz gives approximately 5--10 ms per wait, providing over 200x margin above the 22.5 us required for a single byte transfer at 400 kHz.
<Aside type="tip" title="FX2 Recovery">
If the firmware hangs due to I2C issues, the device can be recovered without a physical unplug. The FX2's CPUCS register (0xE600) is accessible via the boot ROM's `bRequest=0xA0` handler, which runs at hardware priority. See [Boot Sequence](/usb/boot-sequence/) for the recovery procedure.
</Aside>