skywalker-1/docs/boot-debug-findings.md
Ryan Malloy d9f51548e0 Fix BCM4500 boot: spurious I2C STOP corrupted FX2 controller
Removed I2CS bmSTOP "bus reset" from bcm4500_boot() and debug modes.
Sending STOP with no active transaction puts the FX2 I2C controller
into an inconsistent state where subsequent START+ACK detection fails.

Root cause identified through incremental debug modes (wValue 0x80-0x85)
on live hardware: mode 0x82 (with bmSTOP) fails, mode 0x85 (identical
but without bmSTOP) succeeds. Raw I2C reads confirm BCM4500 is alive
the entire time -- only the controller state is corrupted.

BCM4500 now boots successfully in ~90ms. Three I2C devices found on
bus: 0x08 (BCM4500), 0x10 (tuner/LNB), 0x51 (EEPROM).

Also in this commit:
- Timeout-protected I2C functions replacing fx2lib bare while loops
- I2C bus scan and debug mode infrastructure
- Kernel driver blacklist for dvb_usb_gp8psk
- Test tools for incremental boot debugging
- Technical findings documented in docs/boot-debug-findings.md
2026-02-12 10:34:15 -07:00

258 lines
14 KiB
Markdown

# BOOT_8PSK Debugging Findings
Technical reference for the BCM4500 demodulator boot sequence on the Genpix SkyWalker-1 (Cypress FX2 CY7C68013A + Broadcom BCM4500), firmware v3.01.0. Documents the root cause analysis of a firmware hang during I2C initialization and the fixes applied.
**Hardware:** Genpix SkyWalker-1 USB 2.0 DVB-S receiver
**MCU:** Cypress CY7C68013A (FX2LP), 8051 core at 48MHz
**Demodulator:** Broadcom BCM4500
**Firmware:** Custom v3.01.0 (SDCC + fx2lib)
**I2C bus speed:** 400kHz
---
## The Problem
Custom firmware v3.01.0 implements vendor command `BOOT_8PSK` (bRequest=0x89, wValue=1), which powers on the BCM4500 demodulator and initializes it via I2C. When first tested, this command caused the FX2 firmware to hang for over 10 seconds, making the USB device completely unresponsive -- no vendor command would return, and the host-side USB stack would report timeout errors.
The initial suspicion was infinite I2C loops. The fx2lib I2C library uses bare `while` loops that poll hardware status bits with no timeout:
```c
// fx2lib/lib/i2c.c -- original code
while ( !(I2CS & bmDONE) && !cancel_i2c_trans);
```
The `cancel_i2c_trans` variable is intended as an external abort mechanism, but nothing in the firmware sets it during normal operation. If the I2C controller never asserts `bmDONE` (for example, because a slave is holding SCL low), the firmware spins indefinitely in this loop.
Adding I2C timeout protection (described below) eliminated the infinite-hang symptom, but the boot sequence still failed: the BCM4500 probe read returned NACK, and all three register initialization blocks failed.
## Root Cause: Spurious I2C STOP Condition
The boot function originally included a so-called "I2C bus reset" step before any I2C communication:
```c
I2CS |= bmSTOP;
i2c_wait_stop();
```
This pattern appears in various FX2 example code and seems reasonable on its face -- send a STOP condition to ensure the I2C bus is in a known idle state before starting fresh. On the FX2's I2C controller hardware, this is incorrect.
### Incremental Debug Modes
The root cause was discovered through a series of incremental debug modes added to the `BOOT_8PSK` vendor command handler. Each mode executes a subset of the full boot sequence, isolating which step introduces the failure:
| wValue | Action | Result |
|--------|--------|--------|
| `0x80` | No-op: return `config_status` and `boot_stage` only | Works |
| `0x81` | GPIO + power + delays only (no I2C at all) | Works |
| `0x82` | GPIO + power + `bmSTOP` + I2C probe read | **Fails** |
| `0x83` | GPIO + power + `bmSTOP` + probe + init block 0 | **Fails** (same root cause) |
| `0x84` | `bcm_direct_read` only (no GPIO, chip already powered) | Works |
| `0x85` | GPIO + power + reset, **no** `bmSTOP`, then probe | Works |
Three observations clinch the diagnosis:
1. **Mode 0x82 fails but mode 0x85 succeeds.** These two modes are identical except that 0x82 issues `I2CS |= bmSTOP` before the probe read and 0x85 does not. The `bmSTOP` is the only difference, and it is the only thing that breaks I2C.
2. **Mode 0x84 succeeds immediately after 0x82 fails.** Mode 0x84 calls `bcm_direct_read` with no GPIO manipulation or bus reset -- just a plain I2C combined read. If called after a failed 0x82, it succeeds. This proves two things: the BCM4500 is alive and responding on I2C, and the `i2c_combined_read` function itself is correct. The failure in 0x82 is not a timing or power issue.
3. **Raw I2C reads via vendor command 0xB5 succeed after 0x82 fails.** Command 0xB5 uses the same `i2c_combined_read` function as `bcm_direct_read`. Running it from the host side after a failed 0x82 returns valid data from the BCM4500. This confirms the chip was alive the whole time -- the FX2's I2C controller was in a bad state, not the bus or the slave.
The test scripts that drove this investigation are in the `tools/` directory:
- `test_boot_debug.py` -- sends debug modes 0x80 through 0x83 sequentially
- `test_i2c_debug.py` -- powers on via 0x81, runs bus scans, tests probe timing
- `test_i2c_isolate.py` -- tests whether re-reset or insufficient delay causes failure
- `test_i2c_pinpoint.py` -- the definitive test: compares 0x84, 0x85, and 0x82
### What Happens Inside the FX2 I2C Controller
The FX2's I2C master controller is a hardware peripheral accessed through the `I2CS`, `I2DAT`, and `I2CTL` SFRs. The controller implements an I2C state machine in silicon. Writing `bmSTOP` to `I2CS` instructs the hardware to generate a STOP condition (SDA rising while SCL is high).
When no I2C transaction is active -- no prior START has been issued, and the bus is idle -- writing `bmSTOP` puts the controller into an inconsistent internal state. The `bmSTOP` bit may not clear properly (it is supposed to self-clear when the STOP condition completes on the bus), and subsequent START conditions fail to generate proper clock sequences or detect ACK from slaves.
The Cypress TRM (EZ-USB Technical Reference Manual) does not explicitly warn against this, but the I2C chapter describes STOP as a step that follows a completed read or write transaction. It is not documented as a standalone bus-reset mechanism.
The correct way to ensure a clean I2C bus state on the FX2 is to simply proceed with a new START condition. If the bus is idle (which it will be after power-on or after the previous transaction completed normally), the START succeeds and the controller enters its normal operating state. The hardware handles bus arbitration automatically on START.
## The Fix
The fix is a single deletion. Remove the spurious STOP from the boot sequence:
```c
/* BEFORE (broken): */
I2CS |= bmSTOP;
i2c_wait_stop();
/* AFTER (correct): */
/* NOTE: Do NOT send I2CS bmSTOP here. Sending STOP when no transaction
* is active corrupts the FX2 I2C controller state, causing subsequent
* START+ACK detection to fail. The I2C bus will be in a clean state
* when we reach the probe step -- any prior transaction ended with STOP. */
```
The corrected `bcm4500_boot()` function proceeds directly from GPIO/power setup to the I2C probe read without any bus-reset step:
```c
static BOOL bcm4500_boot(void) {
boot_stage = 1;
cancel_i2c_trans = FALSE;
/* P3.7, P3.6, P3.5 HIGH (idle state for control lines) */
IOD |= 0xE0;
/* Assert BCM4500 hardware RESET (P0.5 LOW) */
OEA |= PIN_BCM_RESET;
IOA &= ~PIN_BCM_RESET;
/* No I2CS bmSTOP here -- see note above */
/* Power on: P0.1 HIGH (enable), P0.2 LOW (disable off) */
OEA |= (PIN_PWR_EN | PIN_PWR_DIS);
IOA = (IOA & ~PIN_PWR_DIS) | PIN_PWR_EN;
boot_stage = 2;
delay(30); /* power settle */
IOA |= PIN_BCM_RESET; /* release reset */
delay(50); /* BCM4500 POR + mask ROM boot */
boot_stage = 3;
/* I2C probe -- if this fails, the chip didn't come out of reset */
if (!bcm_direct_read(BCM_REG_STATUS, &i2c_rd[0]))
return FALSE;
/* ... register init blocks follow ... */
}
```
## I2C Timeout Protection
Even with the `bmSTOP` fix, timeout protection on all I2C operations is essential. The FX2's I2C controller has no hardware timeout -- if a slave device holds SCL low (clock stretching), or if an electrical fault prevents `bmDONE` from asserting, the firmware will spin forever in a polling loop.
### The Problem with fx2lib
The fx2lib `i2c_write()` and `i2c_read()` functions poll `bmDONE` and `bmSTOP` with loops like:
```c
while ( !(I2CS & bmDONE) && !cancel_i2c_trans);
```
The `cancel_i2c_trans` flag is declared as `volatile __xdata BOOL` and is set to `FALSE` at the start of each transaction. The library documentation says firmware can set it to `TRUE` from an interrupt to abort a stuck transaction. In practice, nothing in the firmware sets it, so these loops are effectively:
```c
while (!(I2CS & bmDONE)); // infinite if bmDONE never asserts
```
### Timeout-Protected Replacements
The custom firmware replaces all fx2lib I2C functions with timeout-protected wrappers:
```c
#define I2C_TIMEOUT 6000
static BOOL i2c_wait_done(void) {
WORD timeout = I2C_TIMEOUT;
while (!(I2CS & bmDONE)) {
if (--timeout == 0)
return FALSE;
}
return TRUE;
}
static BOOL i2c_wait_stop(void) {
WORD timeout = I2C_TIMEOUT;
while (I2CS & bmSTOP) {
if (--timeout == 0)
return FALSE;
}
return TRUE;
}
```
A `WORD` counter of 6000, decremented in a tight SDCC-compiled loop at 48MHz (4 clocks per 8051 machine cycle, ~12 MIPS), gives approximately 5-10ms per wait. At 400kHz I2C, a single byte transfer (9 clock pulses) takes 22.5 microseconds, so this timeout provides well over 200x margin for normal operations while still bounding the worst case.
All BCM4500 I2C operations -- `i2c_combined_read`, `i2c_write_timeout`, `i2c_write_multi_timeout` -- use these timeout-protected waits and return `FALSE` on timeout, allowing the caller to report failure rather than hanging the firmware.
## Kernel Driver Race Condition
The `dvb_usb_gp8psk` kernel module auto-loads via udev when VID:PID `09C0:0203` appears on the USB bus. This happens every time the FX2 re-enumerates after firmware load. The kernel driver races with the test tools and sends its own `BOOT_8PSK` command (along with other initialization), which interferes with debugging.
Symptoms of this race condition:
- Test scripts report "resource busy" or "entity not found" errors
- The BCM4500 enters an unexpected state because the kernel driver partially initialized it
- The kernel driver detaches from the device mid-test
The fix is to blacklist the module:
```
# /etc/modprobe.d/blacklist-gp8psk.conf
blacklist dvb_usb_gp8psk
blacklist gp8psk_fe
```
After creating this file, run `sudo modprobe -r dvb_usb_gp8psk gp8psk_fe` to unload any currently-loaded instances. The blacklist prevents udev from auto-loading the module on device insertion, giving test tools exclusive access.
## I2C Bus Scan Results
Vendor command `0xB4` performs a full 7-bit I2C bus scan by attempting a START + address + WRITE to every address from 0x01 to 0x77 and checking for ACK. Three devices were found:
| Address | Identity |
|---------|----------|
| `0x08` | BCM4500 demodulator. Status register `0xA2` returns valid data. This is the primary device for all demodulator operations. |
| `0x10` | Likely the tuner or LNB controller. The SkyWalker-1 uses a separate tuner IC (accessed through the BCM4500 in normal operation, but also directly addressable on the shared I2C bus). |
| `0x51` | Likely a configuration EEPROM. Many DVB-S receivers store tuner calibration data or device serial numbers in a small I2C EEPROM at addresses in the 0x50-0x57 range. |
The BCM4500's 7-bit I2C address of `0x08` corresponds to 8-bit wire addresses of `0x10` (write) and `0x11` (read).
## BCM4500 Boot Results After Fix
With the `bmSTOP` removed, the full boot sequence completes reliably:
- **Boot time:** ~90ms total (30ms power settle + 50ms post-reset delay + ~10ms I2C init)
- **config_status:** `0x03` (STARTED | FW_LOADED)
- **boot_stage:** `0xFF` (COMPLETE)
- **Direct registers 0xA2-0xA8:** All return `0x02` (powered, not locked -- expected without a satellite signal)
- **Signal lock:** `0x00` (no lock -- dish not aimed at satellite)
- **Signal strength:** All zeros (same reason)
- **USB responsiveness:** No hang. The firmware remains fully responsive to vendor commands throughout boot and afterward.
## Firmware v3.01.0 Boot Sequence (Corrected)
The complete boot sequence as implemented in `bcm4500_boot()`:
1. **Assert BCM4500 RESET** -- Drive P0.5 LOW. This holds the BCM4500's digital logic in reset while power is applied.
2. **Power on** -- Set P0.1 HIGH (power enable), P0.2 LOW (power disable off). The SkyWalker-1 has complementary power control pins.
3. **delay(30ms)** -- Allow the power supply to settle and reach regulation. The stock firmware uses the same delay.
4. **Release RESET** -- Drive P0.5 HIGH. The BCM4500 begins its internal power-on reset (POR) and mask ROM boot sequence.
5. **delay(50ms)** -- Wait for the BCM4500's POR and internal initialization to complete. The chip needs time for its internal oscillator to stabilize and mask ROM to execute.
6. **I2C probe** -- Read direct register `0xA2` (status) to verify the chip is alive and responding on I2C. If this fails, the boot aborts.
7. **Write init block 0** -- 7 bytes to BCM4500 indirect page 0, starting at register `0x06`. Written via the `0xA6`/`0xA7`/`0xA8` indirect register protocol. Data: `{0x06, 0x0b, 0x17, 0x38, 0x9f, 0xd9, 0x80}`.
8. **Write init block 1** -- 8 bytes to page 0, starting at register `0x07`. Data: `{0x07, 0x09, 0x39, 0x4f, 0x00, 0x65, 0xb7, 0x10}`.
9. **Write init block 2** -- 3 bytes to page 0, starting at register `0x0F`. Data: `{0x0f, 0x0c, 0x09}`.
10. **Set config_status** -- OR in `BM_STARTED | BM_FW_LOADED` (`0x03`). Subsequent vendor commands (tuning, signal strength readout, etc.) check this flag before operating.
The three initialization blocks were extracted from disassembly of the stock v2.06 firmware's `FUN_CODE_0ddd` routine, which performs the same indirect register writes.
## FX2 Hardware Recovery Note
The FX2's CPUCS register at address `0xE600` controls the 8051 CPU's run/halt state. It is accessible via the standard vendor request bRequest=0xA0 (RAM read/write) even when the user firmware is completely hung in an infinite loop.
This works because bRequest=0xA0 is handled by the FX2 silicon's boot ROM, not by firmware. The boot ROM's USB handler runs in a hardware-priority context that preempts the 8051's main loop. Writing `0x01` to CPUCS halts the CPU, new firmware can be loaded into RAM, and writing `0x00` starts it again.
This means `fw_load.py` can reload firmware over a hung device without requiring a physical USB unplug/replug or power cycle. For iterative firmware development, this is significant -- a failed boot attempt that hangs the firmware can be recovered from the host side in seconds:
```bash
sudo python3 tools/fw_load.py load firmware/build/skywalker1.ihx --wait 3
```
The load sequence halts the CPU (CPUCS=0x01), writes new code into RAM, then restarts the CPU (CPUCS=0x00). The device re-enumerates with the new firmware.