omni-pca/docs/JOURNEY.md
Ryan Malloy 7b4052624c Docs: extend JOURNEY through the HA + harness + demo arc; add CHANGELOG
docs/JOURNEY.md — replaced the placeholder 'What's next' section with
seven new chronological entries covering everything that happened after
the panel-search comedy:

  - HA rebuild Phase A: poll-vs-push decision, pure-function helpers
    extraction, 61 unit tests with no HA imports
  - HA Phase B: the six new entity platforms, the Omni state-byte
    overload, security-mode-to-alarm-state mapping, the scene-platform
    skip decision
  - HA Phase C: services + diagnostics + repairs flow
  - 'wait, did we mock enough?' — catching the missing Thermostat
    (6) and Button (3) RequestProperties handlers BEFORE the HA
    harness ever touched the mock
  - HA test harness rough patches: requires-python conflict, pytest_socket
    fight, the CONF_ENTRY_ID-doesn't-exist-in-HA find, teardown hang
    fixed by converting configured_panel into a generator
  - Docker dev stack: mounting only src/ to dodge the read-only-venv
    problem with uv
  - Automated onboarding + screenshots: the auth_code OAuth dance, the
    template-endpoint device-id trick, playwright auto-injection of
    hassTokens, the discovery-during-onboarding nice surprise

Plus appended five new entries to 'Things worth remembering':
  - Pure functions are the cheapest thing in test suites
  - Mocking the entire protocol counterpart catches whole categories
  - pytest_socket + real network can coexist
  - The 'build without a real device' loop is unreasonably effective
  - (existing entries kept verbatim)

Final length: ~6800 words, 27 dated sections plus the lessons list.

CHANGELOG.md — new file. Single 2026.5.10 entry under Keep-a-Changelog-
ish format, broken into seven sections matching the project layers:
Protocol layer (RE findings), Library, Home Assistant integration,
Tests, Developer tooling, Documentation, Known gaps. Cites the source
line numbers for the two non-public protocol quirks. Lists every
public module + every entity platform. Linked to git tag template at
the bottom (release not pushed yet).

Tests still 351 + 1 skip. No code changed.
2026-05-10 16:29:41 -06:00

952 lines
40 KiB
Markdown
Raw Permalink Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# JOURNEY
Raw chronological notes from a few days reverse-engineering HAI's PC Access
3.17, then writing a Python library and a Home Assistant integration to
talk to the panel directly. Dated. Append-only-ish.
---
## 2026-05-10 morning — the pile of binaries
Started with a directory called `PC Access/` that had clearly been zipped
up off a Mac and handed around. The giveaway was `._*` files next to every
real file:
```
-rw------- 1 kdm kdm 120 Aug 15 2016 ._Newtonsoft.Json.dll
-rw------- 1 kdm kdm 484352 Aug 15 2016 Newtonsoft.Json.dll
```
That's AppleDouble cruft: macOS extended attributes shimmed into companion
files when an HFS+ volume gets archived to a non-Apple filesystem. 120 bytes
of resource fork garbage per real file. Useless. Touched everything from
the PC Access install date (Mar 2018) all the way back to a 2006 firmware
updater. Whoever extracted this had been carrying it across Macs for years.
What we actually had:
| File | Size | What it is |
|------|-----:|-----|
| `PCA3U_EN.exe` | 5.4 MB | The PC Access GUI, a .NET assembly (v3.17.0.843, 2018-01-02) |
| `PCA1106W.exe` | 3.3 MB | Older native C++ version from 2008 |
| `f_update.exe` | 437 KB | Native firmware updater (2006) |
| `OT7FileUploaderLib.dll` | 16 KB | OmniTouch 7 firmware uploader |
| `Our House.pca` | 144 KB | A panel config file. High entropy. Not ours. |
| `PCA01.CFG` | 318 B | App settings. Also encrypted. |
| `Serial Number.txt` | 20 B | A 20-char license key |
`Our House.pca` was the interesting one. Entropy 7.994 bits per byte —
either compressed, encrypted, or both. No magic bytes. No structure
visible in the first 256 bytes. It also had someone else's account name
embedded in the metadata: this panel had been bought used and shipped
with the previous owner's config still on it. Held that thought.
`file PCA3U_EN.exe` came back with `Mono/.Net assembly`. That was the
single biggest piece of luck in the whole project: a .NET assembly means
ilspycmd will give us back readable C# in seconds. Beats staring at IDA
listings of Borland C++ runtime stubs all afternoon, which is what
`PCA1106W.exe` would have made us do.
## 2026-05-10 — decompile and skim
Ran ilspycmd 10.0.1.8346 over `PCA3U_EN.exe`. 898 typedefs. They cleanly
split into two namespaces:
- `HAI_Shared` — the domain model, the wire protocol, the crypto, all of
it reusable across HAI's product line (Omni, Lumina, HMS).
- `PCAccess3` — just UI. Forms, controls, window positions.
That's the prize: `HAI_Shared` is essentially a free protocol
implementation library, written by people who actually know how the panel
works, sitting there in C# waiting to be read.
First skim of `HAI_Shared`:
- `clsOmniLinkPacket` — outer transport packet. 4-byte header
(`[seq_hi][seq_lo][type][reserved=0]`) + payload. Sequence number is
big-endian. There are 12 packet types: NewSession, AckNewSession,
RequestSecureSession, AckSecureSession, two flavors of
SessionTerminated, the `OmniLinkMessage` (encrypted, v1) and
`OmniLink2Message` (encrypted, v2) wrappers, plus their unencrypted
twins.
- `clsOmniLinkMessage` — inner application message.
`[StartChar][MessageLength][...payload, payload[0]=opcode...][CRC_lo][CRC_hi]`.
CRC is CRC-16/MODBUS with poly `0xA001`. Standard.
- `clsAES` — the panel's symmetric crypto. AES-128, ECB,
`PaddingMode.Zeros`, key reused as IV (which is fine in ECB but a code
smell that hints at someone copy-pasting from a textbook).
- `enuOmniLink2MessageType` — 83 v2 opcodes. Login, Logout,
RequestSystemInformation, RequestExtendedStatus, Command, ZigBee
pass-through, firmware upload, etc.
- `clsCapOMNI_PRO_II`, `clsCapLUMINA`, `clsCapHMS950e`, … — per-model
capability classes carrying constants like `numZones=176`,
`numUnits=511`. Real domain model, not a config file.
Wrote those down in `findings.md` and pushed on.
## 2026-05-10 — the cipher that wasn't AES
Then we hit the file format. The `.pca` and `.CFG` blobs *look* like
AES-CBC ciphertext. They aren't. From `clsPcaCryptFileStream`:
```csharp
private byte oldRandom(byte max) {
RandomSeed = RandomSeed * 134775813 + 1;
return (byte)((RandomSeed >> 16) % max);
}
// per byte: ciphertext = plaintext ^ oldRandom(255) // mod 255, not 256
```
That multiplier — `134775813` = `0x08088405` — is the Borland Delphi /
Turbo Pascal `Random()` LCG. So someone wrote this thing in Delphi
originally, ported it to C#, and kept the exact same PRNG so existing
.pca files would still decrypt. The mod-255 (not 256) stays in too,
which means the keystream byte is in `[0..254]`, never `0xFF`. It
doesn't lose information — it just shifts the output distribution.
Quirky but not broken.
Two hardcoded 32-bit keys live in `clsPcaCfg`:
```csharp
private readonly uint keyPC01 = 338847091u; // 0x142A3D33 — for PCA01.CFG
public readonly uint keyExport = 391549495u; // for exported .pca files
```
And a third path: `SetSecurityStamp(string S)` derives a per-installation
key from a stamp string:
```csharp
uint num = 305419896u; // 0x12345678 — developer Easter egg as init value
foreach (char c in S)
num = ((num ^ c) << 7) ^ c;
Key = num;
```
`0x12345678` as an init constant is the giveaway: someone was bored at
the keyboard the day they wrote this. It's the kind of thing you grep
for. (The actual hash function, `((k ^ c) << 7) ^ c`, is fine — not
cryptographic, but fine for "let me derive a per-install key from a
serial number.")
## 2026-05-10 — the wrong-key-looks-right problem
Wrote a Python decryptor in maybe an hour: a generator that yields
keystream bytes, an XOR over the file. Easy.
Then we hit a subtle thing. The first script auto-tried the two known
keys and picked the one whose plaintext "looked more printable". It
picked `keyExport`, ran the parser, and got nonsense — but a *plausible*
kind of nonsense: short non-empty strings, non-zero counter values,
generally the texture of real binary data.
Turns out **printable-character ratio is a terrible heuristic for binary
file plaintext.** Random noise is, on average, slightly more "printable"
than a real binary file padded with zeros and length-prefixed strings —
because random noise has a uniform distribution and a real file has long
runs of `0x00` (which falls outside the 32127 printable range).
Replaced it with something concrete and stupid:
```python
def score(pt):
n = pt[0]
if not (1 <= n <= 64): return 0
tag = pt[1:1+n]
if all(32 <= b < 127 for b in tag):
return 100 + n
return 0
```
The first byte is a String8 length, and the next `n` bytes should be the
ASCII version tag like `CFG05` or `PCA03`. If it parses cleanly, the key
is right; if not, it isn't. Robust because it's not statistical.
`PCA01.CFG` decrypted with `keyPC01`. First bytes:
```
00000000 05 43 46 47 30 35 17 41 ... .CFG05.A
```
`CFG05`. Format version 5. Walked the rest of the schema (modem strings,
port number, key field, password) and pulled out the prize:
```
pca_key = 0xC1A280B2 (3,248,652,466)
password = "PASSWORD" # factory default, never changed
```
So the per-installation `.pca` key was sitting inside `PCA01.CFG` the
whole time, encrypted with a hardcoded key that's right there in the
binary. The `keyExport` path is only for files that were exported for
sharing, which is *not* what `Our House.pca` was — it was the live
in-place config.
Decrypted `Our House.pca` with `0xC1A280B2`. First bytes:
```
00000000 05 50 43 41 30 33 ... .PCA03
```
`PCA03`. File format v3. Right key.
## 2026-05-10 — the 2191-byte header parses byte-perfect
Read `clsHAC.ReadFileHeader` to figure out the layout:
```
String8 version_tag "PCA03"
String8(30) AccountName
String16(120) AccountAddress
String8(20) AccountPhone
String8(4) AccountCode
String16(2000) AccountRemarks
byte Model
byte MajorVersion
byte MinorVersion
sbyte Revision
```
One thing about `ReadString8(out S, byte L)`: it always consumes
`1 + L` bytes regardless of the declared string length. So the strings
are fixed-width slots with a length prefix, not variable-length.
Total header size: 2191 bytes.
Then we found the validation block at `clsHAC.cs:7943`:
```csharp
if (num == 2191) { /* header read OK */ }
```
If your byte counter doesn't equal 2191 after parsing the header, you
got it wrong. It did. That was the moment we knew the parser was
correct: not by inspection of the output, but by hitting an exact magic
number that the original code was checking against.
Decoded header:
- Model byte = `0x10` = `enuModel.OMNI_PRO_II`
- Firmware: 2.12 r1
- AccountName / Address / Phone — the previous owner's PII
- 8 user codes, all still factory default `12345678`
That last one stung. The panel had probably been sitting on someone's
wall for a decade with `12345678` as the master code. (Not our panel,
yet — but our panel was about to inherit it.) Plaintext stays in
`extracted/Our_House.pca.plain` and that path stays in `.gitignore`.
All future notes redact PII.
## 2026-05-10 — walking the body
Header was 2191 bytes; the file is 144 KB. Plenty more to parse before
we'd hit the network connection block where the AES key for live-panel
talk is stored.
The body layout (from `clsHAC.ReadFromFile`):
```
ByteArray SetupData.data (3840 bytes for OMNI_PRO_II)
bool slRequireCodeForSecurity
bool slPasswordOnRestore
UInt16 (discarded)
UInt16 EventLog.Count
UInt32 (discarded)
ZoneNames, UnitNames, ButtonNames, CodeNames, ThermostatNames,
AreaNames, MessageNames
ZoneVoices, UnitVoices, ButtonVoices, CodeVoices, ThermostatVoices,
AreaVoices, MessageVoices
Programs
EventLog
# v >= 2:
if Ethernet feature:
String8(120) Connection.NetworkAddress
String8(5) port-string
String8(32) ControllerKey-as-hex <- 32 hex chars = 16-byte AES key
...
```
The Names blocks were straightforward: each is `max_slots * (1 + name_len)`
bytes. For Zones that's `176 * 16 = 2816` bytes. Adds up cleanly.
Then we hit the Voices blocks and the parser desynced.
## 2026-05-10 — the latent bug in PC Access itself
Each "Voice" block lets the panel speak the name of an object. Six
phrases per object (`numVoicePhrases = 6`). The C# reads them like this:
```csharp
byte[] B = new byte[CAP.numVoicePhrases]; // 6 bytes
for (int i = 1; i <= GetFileMaxX(); i++) {
num = (i > Count)
? num + FS.ReadByteArray(out B, B.Length) // skip path: 6 bytes
: num + _Items[i-1].Voice.Read(FS); // structured path
}
```
The "structured path" calls `clsVoiceWordArray.Read`, which branches on
whether the panel has the `LargeVocabulary` feature:
- LargeVocabulary present → 6 phrases × **2 bytes** (UInt16) = **12 bytes**
- LargeVocabulary absent → 6 phrases × 1 byte = 6 bytes
OMNI_PRO_II *has* LargeVocabulary. So the structured path reads 12 bytes
per slot. But the **skip path** in the loop above always reads 6 bytes,
no matter what. There's no `if (LargeVocabulary) B = new byte[12];`.
If `Count == GetFileMaxX()` (every slot is filled), this never matters —
the skip path is never taken. For every block on our panel except one,
that's true. But Units has `Count = 511` and `GetFileMaxX = 512`, so
exactly one slot takes the skip path, reads 6 bytes when it should have
read 12, and the next 6 bytes — which are actually the start of the
*next* block — get treated as the tail of the current slot. The parser
walks 6 bytes off the rails and never recovers.
The C# code in the wild gets away with this because `Count >= Max` for
basically all real panels in deployment. But it's a real bug — it would
bite if a model ever shipped with LargeVocabulary AND had Buttons or
Messages with `Count < Max`. We patched our parser; the original is
still wrong.
Found it by hex-dumping the file, locating the panel IP address
(`192.168.1.9`) at byte offset `0xe2d8`, and back-solving the diff
between where we expected to land and where the IP actually was. The
gap was exactly 6684 bytes, which is `(512-1)*6` worth of voice slots
read at half the right size. Math checked out. Off by N.
## 2026-05-10 — the prize
After the Voices, the body has Programs (1500 × 14 B), EventLog (250 ×
9 B), and then — for a v3 file with the Ethernet feature — the
Connection block:
```
String8(120) Connection.NetworkAddress
String8(5) port-string
String8(32) ControllerKey-as-hex
```
For our panel:
- IP: `192.168.1.9`
- Port: `4369`
- ControllerKey: 16 bytes of AES-128 key, extracted at file offset
`0xe2d8`
Total bytes to that point: `2191 + 3840 + 10 + 15407 + 13374 + 21000 + 2250 = 58072 = 0xe2d8`.
Exactly the offset where the IP appears in the hex dump. Done.
That key plus the right handshake = direct talk to the panel.
## 2026-05-10 — the two non-public quirks
Now we needed to read `clsOmniLinkConnection.cs`. It's 2109 lines of
state machine for the secure-session handshake, the keepalive timer, the
TCP framing, and the encryption. We expected a textbook AES session: send
client-hello, get server-hello, derive key from PIN somehow, encrypt
everything from then on.
What we found instead were two surprises that no public Omni-Link
write-up we'd seen mentions. Both of them look like quirks. Both of them
will reject your client with `ControllerSessionTerminated` if you skip
them.
### Quirk 1 — the session key is not the ControllerKey
You'd expect the AES session key to be the ControllerKey verbatim. It
isn't. From `clsOmniLinkConnection.cs:1886-1892`:
```csharp
SessionKey = new byte[16];
ControllerKey.CopyTo(SessionKey, 0);
for (int j = 0; j < 5; j++)
{
SessionKey[11 + j] = (byte)(ControllerKey[11 + j] ^ SessionID[j]);
}
AES = new clsAES(SessionKey);
```
The first 11 bytes of the session key are the ControllerKey verbatim.
The last 5 bytes are the ControllerKey XORed with a 5-byte `SessionID`
nonce that the controller sent in `ControllerAckNewSession`. That's
the entire key derivation. No PBKDF2, no HKDF, no PIN, no salt. Just
five bytes of XOR.
The same five-byte block appears twice in the source — once for UDP
(line 1423) and once for TCP (line 1886). Identical.
The implication for someone writing a client is: if you encrypt your
`ClientRequestSecureSession` with the raw ControllerKey, the panel
decrypts it to garbage and disconnects you. You have to wait for the
nonce, mix it in, *then* encrypt.
### Quirk 2 — per-block XOR pre-whitening before AES
This one is the real headline. Before AES-encrypting any payload block,
the first two bytes of every 16-byte block get XORed with the packet's
sequence number. Same XOR mask, every block of the packet. From
`clsOmniLinkConnection.cs:396-401`:
```csharp
for (num = 0; num < PKT.Data.Length; num += 16)
{
PKT.Data[num] = (byte)(PKT.Data[num] ^ ((PKT.SequenceNumber & 0xFF00) >> 8));
PKT.Data[num + 1] = (byte)(PKT.Data[num + 1] ^ (PKT.SequenceNumber & 0xFF));
}
PKT.Data = AES.Encrypt(PKT.Data);
```
And then the inverse on receive (`:413-417`):
```csharp
PKT.Data = AES.Decrypt(PKT.Data);
for (int i = 0; i < PKT.Data.Length; i += 16)
{
PKT.Data[i] = (byte)(PKT.Data[i] ^ ((PKT.SequenceNumber & 0xFF00) >> 8));
PKT.Data[i + 1] = (byte)(PKT.Data[i + 1] ^ (PKT.SequenceNumber & 0xFF));
}
```
So the on-the-wire encryption is "AES-128-ECB of (payload XOR-prewhitened
with the seq number, two bytes per block)". A naive Omni-Link client that
just AES-ECB-encrypts the raw payload will produce ciphertext the panel
won't accept.
It feels weak — an attacker with a known-plaintext for one block can
recover the seq XOR mask trivially, and from there the whitening is
unprotected. But it's the protocol. The panel won't talk to you without
it.
We think the original intent might have been something like nonce-mixing
(use the seq as a per-packet salt to defeat ECB block-repetition
attacks), and the implementation got cargo-culted from one block to all
blocks of the packet. Doesn't matter. Implement it. Move on.
A bonus surprise: **there is no separate `Login` step on TCP.** The C#
defines `clsOL2MsgLogin` (v2 Login, opcode 42) but never instantiates
it on the TCP path. Possessing the right ControllerKey *is* the
authentication. The login opcode appears to be a serial-only artifact
from before the Ethernet module existed. The v1 serial path *does*
construct `clsOLMsgLogin` with the user's PIN; the v2 TCP path goes
straight from `ControllerAckSecureSession` to `RequestSystemInformation`.
We documented all of this in `notes/handshake.md` while it was fresh.
## 2026-05-10 around noon — first commit
```
9a02418 Initial scaffold + protocol primitives
```
uv project, ruff, pytest, mypy strict, MIT, README, gitignore explicitly
protecting any `.pca` or panel keys. Date-versioned (CalVer): `2026.5.10`.
The library lives in `src/omni_pca/`:
- `crypto.py` — AES-128-ECB plus the per-block XOR seq pre-whitening and
the `SessionKey = CK[0:11] || (CK[11:16] XOR SessionID)` derivation
- `opcodes.py` — all 12 packet types, all 104 v1 opcodes, all 83 v2
opcodes, all transcribed by hand from the decompiled enums
- `packet.py` — outer `Packet` with `encode()`/`decode()`
- `message.py` — inner `Message` with CRC-16/MODBUS
- `pca_file.py` — Borland LCG cipher, `PcaReader`, parsers for both
`.pca` and `.CFG`
49 tests passed, ruff clean. The protocol unit tests use canned bytes
extracted from the C# source; they don't need a panel to run.
## 2026-05-10 1pm — mock panel as ground truth
Second commit:
```
1901d6e Async client + mock panel + e2e roundtrip
```
The async client (`OmniConnection`, `OmniClient`) runs the four-step
secure-session handshake, frames TCP correctly (read first 16-byte block,
decrypt, learn `MessageLength`, read the rest), keeps a per-direction
monotonic sequence number that wraps `0xFFFF → 1` (skipping 0 because the
controller uses 0 for unsolicited packets), and dispatches solicited
replies to a Future while shoving unsolicited packets into a queue.
That's all well and good, but how do we test it without a panel? The
panel was at `192.168.1.9` last we knew, and we had no idea if its
network module was even on. Building a real Omni controller emulator
in Python turned out to be the right answer.
`mock_panel.py` is a TCP server that:
- accepts `ClientRequestNewSession`, generates a 5-byte SessionID,
sends back `ControllerAckNewSession` with the version bytes `00 01`
prepended
- derives the same SessionKey the client did (using the same XOR-mix)
- decrypts the `ClientRequestSecureSession`, validates that the 5-byte
echo matches the SessionID it just sent, sends back the symmetric
`ControllerAckSecureSession` (re-encrypting the same SessionID)
- handles `RequestSystemInformation`, `RequestSystemStatus`,
`RequestProperties` (Zone/Unit/Area, both absolute index and rel=1
iteration with EOD termination), and Naks anything else
It's a thin emulator but it's a *complete* protocol counterpart. Six
end-to-end tests connect a real `OmniClient` over a real TCP socket to
a real `MockPanel` and exchange real frames. They prove the handshake,
the AES, the XOR whitening, and the sequence numbering all agree —
because if any one of them is wrong, decryption produces garbage and
the connection drops.
That ground-truth check was load-bearing. It meant we could iterate on
the client all afternoon without worrying that some bug in our
encryption was being masked by a bug in our framing.
## 2026-05-10 ~1:10pm — the HA scaffold
Third commit:
```
2e43936 HA custom_component scaffold (binary_sensor for zones)
```
Drop-in Home Assistant integration at `custom_components/omni_pca/`:
manifest, config_flow with auth + reauth, coordinator with reconnect
logic, binary_sensor for each named zone with `device_class` derived
from `zone_type` (OPENING, MOTION, SMOKE, etc.). 12 unit tests for
`parse_controller_key()` because that's the one piece of pure logic
worth pinning down hard.
Status of the HA component itself wasn't validated against a running
Home Assistant — that comes next. But the HACS manifest is there, so
once we trust it we can drop it in.
## 2026-05-10 2pm — fleshing out the model surface
Fourth commit:
```
08974e2 Models: 16 status/properties dataclasses + enums + temp converters
```
The Omni protocol has a wide object surface — Zones, Units, Areas,
Thermostats, Buttons, Programs, Codes, Messages, Aux Sensors, Audio
Zones, Audio Sources, User Settings — and each has both a "properties"
record (configured, mostly static) and a "status" record (live state).
Wrote frozen-slots dataclasses for all of them, with `.parse(payload)`
classmethods that decode the byte layouts straight from the C# field
definitions. Added IntEnums for the dispatch tags (`ObjectType`,
`SecurityMode`, `HvacMode`, `FanMode`, `HoldMode`, `ThermostatKind`,
`ZoneType`, `UserSettingKind`).
One small surprise from `clsText.cs`: the temperature encoding the
panel uses is *linear*, not the non-linear thermistor scale we'd
guessed it might be. `C = raw / 2 - 40`. Easy.
42 new tests. 139 total.
## 2026-05-10 ~2:15pm — commands and events
Fifth commit:
```
68cf44a Library v1.0 phase B: command opcodes + typed system events
```
`commands.py` — the `Command` IntEnum, sourced from `enuUnitCommand.cs`
which is the canonical "all commands" enum despite the misleading name
(it covers HVAC, security, scene, button, message commands too — not
just units). One naming weirdness: `enuUnitCommand.UserSetting` (104) is
actually EXECUTE_PROGRAM. Renamed for clarity in our enum and left the
original C# alias documented inline so anyone cross-referencing won't
get confused.
`OmniClient` got 18 new methods: `execute_command`,
`execute_security_command`, `acknowledge_alerts`, `get_object_status`,
`get_extended_status`, plus convenience wrappers (`turn_unit_on`,
`set_unit_level`, `bypass_zone`, `set_thermostat_heat_setpoint_raw`,
…). All the command methods raise `CommandFailedError` on Nak.
`events.py` — the `SystemEvents` (opcode 55) decoder. The panel pushes
batches of these unsolicited; each batch contains multiple events of
different types (zone state changes, unit state changes, arming
changes, alarm activated, AC lost, battery low, phone line dead, X10
codes received, …). 28 dispatch tags, 26 typed event subclasses, an
`UnknownEvent` catch-all for opcode values we don't know yet, and an
`EventStream` helper that flattens batches across messages.
55 new tests. 194 total.
## 2026-05-10 ~2:30pm — stateful mock and the full v1.0 surface
Sixth commit:
```
c26db62 Library v1.0 phase C: stateful mock + e2e for the new surface
```
The mock got real state. `MockUnitState`, `MockAreaState`, `MockZoneState`,
`MockThermostatState`, plus a `user_codes` table for security validation.
All the new opcodes wired through:
- `Command` (20) → Ack with state mutation, dispatching UNIT_ON, UNIT_OFF,
UNIT_LEVEL, BYPASS_ZONE, RESTORE_ZONE, SET_THERMOSTAT_HEAT, etc.
- `ExecuteSecurityCommand` (74) → Ack on a valid code, Nak on invalid
- `RequestStatus` (34) → `Status` (35) for the four object kinds with
hard-coded record sizes per `clsOL2MsgStatus.cs:13-27`
- `RequestExtendedStatus` (58) → `ExtendedStatus` (59) with the
`object_length` prefix and the richer per-type fields
- `AcknowledgeAlerts` (60) → Ack
- And synthesized `SystemEvents` (55) pushed with `seq=0` whenever state
changes, so the e2e tests can subscribe to events through the real
client API and watch them roundtrip cleanly through `events.parse_events()`
9 new e2e tests — arm/disarm with code validation, unit on/off/level,
zone bypass/restore, thermostat setpoint, push events for arming and
unit changes, acknowledge_alerts. 203 total passing, 2 skipped (the
HA harness and a `.pca` fixture we don't ship).
The library has the v1.0 surface: read, command, status, extended status,
events. All exercised by an in-process emulator that speaks the same
protocol as the real panel.
## 2026-05-10 afternoon — trying to find the real panel
Now the part that didn't go well.
The `.pca` file said the panel lived at `192.168.1.9:4369`. Tried to
connect: nothing. TCP SYN, no SYN-ACK. Pinged: silent. nmap'd the
subnet to make sure we were on the right network:
- `192.168.1.7`, `.8`, `.11` — open ports including SSH with banner
`SSH-2.0-dropbear_2018.76`. Three OmniTouch 7 touchscreens. They're
the wall-mounted controllers; they live on the same LAN as the panel,
speak Omni-Link II to the panel themselves, and run a stripped Linux
with dropbear for the firmware updater. Confirmed by the SSH banner
date (2018) lining up with the OmniTouch 7 firmware era.
- `.6` — likely the panel itself, but no open ports, no response.
- `.9` — also dark. The 2018 IP either changed or the network module
was disabled at some point.
So the panel is sitting there, doing its job (the touchscreens clearly
work — they're on the network), but its Ethernet/Omni-Link II module is
either turned off in the panel's setup menu or the network bridge
hardware is bad. We have the ControllerKey, we have the right port, we
have a fully-tested client and a mock panel that proves the client
works end-to-end — but we can't prove it against the real thing yet.
We have, in other words, built the world's most thoroughly-tested
unused integration. There is something quietly funny about that.
The fix is physical: walk over to the panel, find the menu that
enables the Ethernet module, save, reboot. Then the live validation
becomes a five-minute test. Until then, the mock is the best we have,
and the mock is a faithful enough emulator that we trust it.
## 2026-05-10 evening — HA rebuild Phase A
The first HA scaffold (a placeholder `binary_sensor` for zones, written
before the library was complete) needed to come down and get rebuilt on
the v1.0 surface. The interesting design choice: how should the
coordinator pull state?
Option A: re-poll everything every N seconds.
Option B: rely on the panel's unsolicited push messages and only poll
as a backstop.
We picked B. The Omni panel is genuinely chatty — when a zone trips,
when an area arms, when AC fails, when a unit toggles, the panel pushes
a `SystemEvents` packet within a few hundred ms. Our `OmniConnection`
already decodes those into typed `SystemEvent` objects via an async
iterator (`client.events()`). The coordinator now runs a long-lived
background task consuming that iterator and patches the relevant slice
of state in-place, then calls `async_set_updated_data()` so HA reacts
immediately. The 30-second poll is a safety net for state we missed.
The piece that took longer than expected was extracting pure functions
from the entity-class soup so we could unit-test without HA installed
in the venv. We ended up with `helpers.py`: zone-type → device-class
mapping, latched-vs-current-condition logic per zone family, name
prettifier (`FRONT_DOOR``Front Door`). 61 unit tests for `helpers.py`
alone, all running without importing `homeassistant.*`. Sounds excessive
until you remember that pure-function tests are the only ones that run
in <100ms; you don't want to wait 15 seconds for HA to boot just to
verify that zone-type 32 (FIRE) maps to `BinarySensorDeviceClass.SMOKE`.
## 2026-05-10 evening — HA Phase B (the entity build-out)
Six platforms in one pass: `alarm_control_panel` (per area, with code
validation), `light` (per unit, dimmable), `switch` (per zone for
bypass control), `climate` (per thermostat, full HVAC modes),
`sensor` (analog zones + thermostat readings + panel telemetry),
`button` (per panel macro), `event` (one per panel relaying typed
push events as HA event_types).
The mapping work was repetitive but mostly mechanical. The interesting
bits:
- The Omni unit "state" byte is overloaded: 0=off, 1=on (relay),
100..200=brightness percent (state - 100), plus weird ranges for
scene levels (2..13) and ramping codes (17..25). Encoded as a pair
of pure helpers (`omni_state_to_ha_brightness` /
`ha_brightness_to_omni_percent`) so the conversion is unit-tested.
- Omni's `SecurityMode` enum has *both* steady-state values (Off=0,
Day=1, Away=3, …) *and* arming-in-progress values (ArmingDay=9,
ArmingAway=11, …). The HA `AlarmControlPanelState` mapping needs
to bucket the 9..14 range into HA's `arming` state regardless of
destination. Plus alarm_active overrides everything to `triggered`,
and entry-timer running means `pending`, exit-timer means `arming`.
All of this lives in one pure `security_mode_to_alarm_state()`
function so it's unit-testable end to end.
- The HA `event` platform is newer than I'd realised. It exposes
push events as a single entity per integration with `event_types`
and `event_data`. Automations key on `platform: event` filtering
by `event_type`. We surface 12 event-type strings:
`zone_state_changed`, `unit_state_changed`, `arming_changed`,
`alarm_activated`, `alarm_cleared`, `ac_lost`, `ac_restored`,
`battery_low`, `battery_restored`, `user_macro_button`,
`phone_line_dead`, `phone_line_restored`, plus an `unknown`
catch-all for the 14 less common SystemEvent subclasses.
Skipped the `scene` platform entirely. Omni "scenes" are actually
just user-named button macros the underlying call is the same
`execute_button` that the `button` platform already exposes. Adding
a parallel scene wrapper would just double-count entities. Documented
the choice in the integration README.
## 2026-05-10 evening — HA Phase C (services + diagnostics)
Seven services, all routed through a `services.py` module that's
idempotently registered on first config-entry setup and unloaded on
the last config-entry teardown:
```
omni_pca.bypass_zone
omni_pca.restore_zone
omni_pca.execute_program
omni_pca.show_message
omni_pca.clear_message
omni_pca.acknowledge_alerts
omni_pca.send_command (raw escape hatch)
```
Each takes an `entry_id` field with HA's `config_entry` selector so
the UI gives users a panel picker. `services.yaml` declares the
schema; `services.py` enforces it via `voluptuous`.
Diagnostics endpoint dumps a redacted snapshot for bug reports:
`controller_key` redacted via `async_redact_data`; zone/unit/area
names hashed with sha256 so structure is visible without leaking
PII; counts per object type; last event class; last update success
timestamp. Useful one day, useless until then, but it's three lines
and HA users expect it.
## 2026-05-10 evening — "wait, did we mock the panel enough?"
The thinking-out-loud moment that caught a real bug. The HA test
harness was about to be set up; before doing that, the question was:
does the mock actually answer every opcode the HA coordinator calls?
Mapped HA-side calls to mock-side handlers. Most matched. But the
HA coordinator walks `RequestProperties` for object types Thermostat
(6) and Button (3), and the mock's `_reply_properties` only knew
about Zone/Unit/Area. Both would have returned `Nak`, the coordinator
would have moved on, and HA would have discovered zero thermostats
and zero buttons no matter how `MockState` was seeded.
Added the two handlers (each ~30 lines: build the per-object
Properties body matching the wire format documented in
`models.ThermostatProperties.parse` / `models.ButtonProperties.parse`),
plus two e2e tests that drive the walk with `OmniClient` and assert
the parses come out clean. Caught it before HA ever touched the mock.
This is the kind of bug that *would* have shown up the first time
you tried the integration: zero climate entities, zero button
entities, no error message because the panel just said "no, I have
no thermostats here". You'd spend an hour staring at it. Mock-the-
whole-protocol pays for itself the first time it catches one of
these.
## 2026-05-10 evening — HA test harness, the rough patches
`pytest-homeassistant-custom-component` is the standard HA dev test
harness. It pins to a specific HA version (we got `2026.5.1` paired
with HA `2026.5.x`) and provides fixtures to spin up HA in-process
per test. Sounds simple. Three rough patches:
1. **`requires-python` conflict.** Our library targets `>=3.12`. HA
`2026.5+` requires `>=3.14.2`. uv resolves dependency groups
against the project's `requires-python` and refused to install
the test harness because it couldn't find a Python version
satisfying both. Bumped the project to `>=3.14.2` fine for HA
users (HA already needs 3.14), library users on older Python
pin to a previous omni-pca version.
2. **`pytest_socket` blocks our e2e tests.** The HA harness installs
`pytest_socket` globally to keep HA unit tests hermetic. That
broke our existing 17 e2e tests that legitimately need to talk
to a localhost MockPanel over a real TCP socket. Fix: a top-
level `tests/conftest.py` autouse fixture requesting the
harness's `socket_enabled` fixture, which re-enables sockets by
default. HA-side tests can opt back into the strict policy if
they want.
3. **`CONF_ENTRY_ID` doesn't exist in HA.** Our `services.py` was
importing `CONF_ENTRY_ID` from `homeassistant.const`. The harness
import-test caught it: HA exports the constant as
`ATTR_CONFIG_ENTRY_ID`, not `CONF_ENTRY_ID`. Without the harness,
this would have crashed on first install in a real HA. Worth the
harness already.
Then teardown started hanging. Each test passed (5-15 seconds for HA
boot + entity discovery + assertions) but the harness's
`verify_cleanup` timed out waiting for the coordinator's background
event-listener task to finish. The coordinator's `async_shutdown()`
cancels it cleanly but the harness was tearing the test down without
calling unload first. Fix: convert the `configured_panel` fixture into
a generator and call `hass.config_entries.async_unload()` in the
teardown branch. With that, all 12 HA-side tests run in 0.74 seconds
total (each one boots HA, runs config flow, asserts, unloads).
Final score: 351 tests pass, 1 skipped (the gitignored `.pca`
fixture), ruff clean across `src/ tests/ custom_components/`.
## 2026-05-10 late evening — docker dev stack
Wanted a one-command setup so the integration could be browsed
manually and screenshotted for the README. `docker-compose.yml` with
two services: real HA `2026.5` from upstream + a sidecar running
the mock panel.
The interesting wrinkle: the mock panel container needs to import
`omni_pca`. Mounting the project read-only and running `uv` inside
the container failed because uv tried to recreate the host's
`.venv` and the mount was read-only. Fix: mount only `src/` and
`run_mock_panel.py`, set `PYTHONPATH=/tmp/mock/src`, install just
`cryptography` via `uv pip install --system`, run the script
directly. No package install, no venv, just a Python interpreter
with the right import path.
## 2026-05-10 late evening — automated HA onboarding + screenshots
`dev/screenshot.py` does the entire flow:
1. POST `/api/onboarding/users` to create the demo user (returns
`auth_code`)
2. POST `/auth/token` with `grant_type=authorization_code` to get
the access token (HA doesn't support password grant)
3. On subsequent runs: log in via `/auth/login_flow` (cleaner than
re-using a saved token; the token expires in 30 minutes anyway)
4. POST `/api/config/config_entries/flow` to start the omni_pca
config flow, then post the user-input dict to complete it
5. Cache the panel's device_id by calling HA's template endpoint
(`{{ device_id('sensor.omni_pro_ii_panel_model') }}`) which is
a delightfully clean way to ask HA "what's the device id for this
entity?"
6. Launch headless chromium via the `playwright` Python package,
inject `localStorage.hassTokens` so it skips the login screen,
navigate to six deep-linked pages and screenshot each
The whole script is ~250 lines and produces six PNGs. The
`04-panel-device.png` is the headline shot: HA's device page for
"Omni Pro II / by HAI / Leviton / Firmware: 2.12r1" with all the
Controls (lights, buttons, areas, thermostats), Activity panel,
Diagnostics download. Every entity from the mock visible in real HA
UI in the right shape.
A nice side-effect: HA's onboarding wizard has a "We found compatible
devices!" step that scans the network for known integrations. Our
manifest got picked up "HAI/Leviton Omni Panel" appeared in that
list during onboarding even though we hadn't done anything explicit
to register it for discovery. The integration name and `iot_class`
in `manifest.json` was enough.
## What's left for future sessions
The panel's network module is still off. When it comes back online,
the moment of truth is one TCP connect to `192.168.1.6:4369` (or
wherever it lives now) and one `RequestSystemInformation`. If the
reply is `Omni Pro II / 2.12 r1` the entire stack file decryption,
key extraction, key derivation, XOR pre-whitening, AES, framing,
sequencing was right end to end. The mock says yes. We'll find out.
Other backlog items:
- `Programs` discovery (no `RequestProperties` opcode for Programs;
current implementation returns an empty dict needs a real
protocol path or a separate `RequestProgramData` style call)
- HACS submission once we've validated against the live panel
- Maybe publish `omni-pca` to PyPI so the HA `manifest.json`
requirements line works without a wheel install
---
## Things worth remembering
**The "wrong key looks plausible" problem is real and recurring.**
Statistical heuristics (entropy, printable ratio, frequency analysis)
are great for telling random noise from English; they're terrible for
telling random noise from binary file plaintext. When a file format
has a known header magic, parse-the-magic beats every heuristic.
**Magic numbers in source code are gifts.** `0x12345678` as an init
value, `134775813` as an LCG multiplier, `2191` as a header length
each one is a hard checkpoint that tells you, on first try, whether
the next four hours are going to be productive or not.
**A complete protocol counterpart is worth more than ten times its
LOC in confidence.** The mock panel was maybe 400 lines of code and
it eliminated an entire category of "is the client wrong or am I
holding it wrong" questions. Every test that connects a real client
to it through real TCP is a test that the entire stack handshake,
encryption, framing, sequencing agrees with itself.
**Quirk #2 (the per-block XOR pre-whitening) is the kind of thing
nobody finds without doing the work.** It's not in `jomnilinkII`,
not in `pyomnilink`, not in the public Omni-Link II writeups we
checked. The decompiled C# was unambiguous and twice-redundant
(once for encrypt, once for decrypt). Without those exact six lines
of source, an OSS client that did everything else right would still
get `ControllerSessionTerminated` on the first encrypted message,
with no useful diagnostic.
**The latent LargeVocabulary bug in PC Access is harmless but
symptomatic.** It's a copy-paste mistake the skip path uses a
buffer sized for the no-LargeVocabulary case while the structured
path uses the LargeVocabulary size. Every panel in deployment
satisfies `Count >= Max` for the affected blocks, so the bug never
fires. But it would, on a model that doesn't, and PC Access would
silently mis-parse its own config file. The kind of bug that lives
in shipping code for a decade because nobody runs the unhappy path.
**Pure functions are the cheapest thing in test suites.** The HA
custom_component grew six entity platforms before it had any HA
test harness installed. Every translation between Omni's wire
encoding and HA's UI encoding lives in `helpers.py` as a pure
function with no HA imports. 61 unit tests for those alone, all
running in <100ms. When the harness arrived, the only thing left
to test was the wiring itself and the wiring tests run in 0.74
seconds for the entire 12-test HA-side suite because the pure
parts already had coverage.
**Mocking the entire protocol counterpart, not just the surface,
catches whole categories of bugs.** When the mock and the client
were both being grown, a "did we mock enough?" check caught two
missing `RequestProperties` handlers (Thermostat and Button). HA
would have discovered zero of either type silently. With the
real-world panel offline, mock-the-protocol is the only way to
trust the stack but even with the panel available, it's the
only way to trust changes without rebooting hardware between every
edit.
**`pytest_socket` and "real network in tests" can coexist.** HA's
test harness disables sockets globally to keep core unit tests
hermetic. Our integration tests need real TCP to talk to the in-
process MockPanel. The fix is one autouse fixture that requests
the harness's `socket_enabled` fixture; takes ten seconds, lets
both worlds work without modification.
**The "build the integration without a real device" loop is
unreasonably effective.** With the docker dev stack, the full
flow is `make dev-up`, click through HA onboarding (or run
`screenshot.py` to do it via REST), see your entities. Make a
code change, `docker compose restart homeassistant`, refresh the
browser, see the change. Repeat. The panel itself becomes optional
for ~95% of the development. The other 5% is the live-validation
lap when the panel comes back online.