informix-db/docs/DECISION_LOG.md
Ryan Malloy 1a149074d4 Phase 0: populate PROTOCOL_NOTES and JDBC_NOTES from clean-room JDBC reading
Decompiled ifxjdbc.jar (4.50.JC10, build 146, 2023-03-07) with CFR 0.152
into build/jdbc-src/. The decompiled tree is gitignored — it's a
clean-room understanding reference, not shipped code.

Findings landed in two artifacts:

JDBC_NOTES.md — the reverse-lookup index:
- JAR identity (SHA256, manifest, line counts)
- Package layout (com.informix.{asf,jdbc,lang} are the load-bearing
  packages; org.bson and the JDBC API surface get ignored)
- Class index mapping each wire-protocol concern to the responsible
  Java class. Highlights:
  - com.informix.asf.Connection (the wire transport / login PDU)
  - com.informix.asf.IfxData{Input,Output}Stream (framing primitives)
  - com.informix.jdbc.IfxMessageTypes (140+ message-tag constants)
  - com.informix.lang.JavaToIfxType / IfxToJavaType (codecs)
  - com.informix.jdbc.IfxSqli / IfxSqliConnect (the SQLI state machine)
- Auth landscape: plain-password is inline in the binary login PDU;
  PAM is a server-initiated post-login challenge/response; CSM is
  removed from this driver (literally throws an error if you try)

PROTOCOL_NOTES.md — the byte-level wire-format reference:
- Endianness: big-endian, network byte order (confirmed from
  JavaToIfxInt source)
- Width table: SmallInt 2B, Int 4B, BigInt 8B, plus the legacy 10-byte
  LongInt that we skip for MVP
- 16-bit alignment requirement for variable-length payloads — every
  string/decimal/datetime is 0-padded if odd-length, missing this
  desynchronizes the parser
- Login PDU structure decoded byte-by-byte from encodeAscBinary():
  SLheader (6 bytes) + PFheader with markers 100/101/104/106/107/
  108/116/127, capability bitfield, env vars, process info, app name
- Disconnection: bare [short SQ_EXIT=56] both directions, no header
- Post-login messages have NO header — protocol is stream-oriented:
  [short tag][payload][short tag][payload]...
- Message-type tag table categorized by purpose
- Open questions list and cross-check matrix tracking what's
  JDBC-derived vs PCAP-confirmed

DECISION_LOG.md additions:
- ifxjdbc.jar 4.50.JC10 selected as JDBC reference; CFR 0.152 as decompiler
- CSM is officially dead — never plan for it
- Plain-password auth is single-round-trip (no challenge/response)
- Wire-framing primitives locked in for _protocol.py
- Container credentials: user=informix, password=in4mix, on port 9088,
  TLS off

Phase 0 exit gate: criteria #1 (login layout), #2 (message-type tags),
#3 (SELECT 1 hypothesis) are derived from JDBC. PCAP capture (task #7)
and cross-reference (task #2) remaining to corroborate.
2026-05-02 16:00:30 -06:00

171 lines
9.7 KiB
Markdown

# Decision Log
Running rationale for protocol, auth, type, and architecture decisions made during the project. New decisions append; old ones are *amended* (with date) rather than overwritten.
Format: every decision has a date, a status (`active` / `superseded` / `revisited`), the chosen path, the discarded alternatives, and the *why*.
---
## 2026-05-02 — Project goal & off-ramp
**Status**: active
**Decision**: Build a pure-Python implementation of the SQLI wire protocol. No IBM Client SDK. No JVM. No native libraries.
**Off-ramp** (chosen by user during planning): if Phase 0 reveals the protocol is intractable in pure Python — e.g., mandatory undocumented crypto in the handshake — narrow scope (lock to one server version, drop async, drop prepared statements if needed) and stay pure-Python. Do **not** fall back to JPype/JDBC; that defeats the project's purpose.
**Why**: The "no SDK / no JVM" goal is what makes this driver valuable. A JPype fallback would ship something that works but solves nothing the existing JDBC-via-JPype solution doesn't already solve.
---
## 2026-05-02 — Package name
**Status**: active
**Decision**: `informix-db`
**Discarded**: `informixdb-pure` (longer), `ifxsqli` (less discoverable), `pyifx` (obscure)
**PyPI availability**: confirmed available 2026-05-02 (HTTP 404 on `/pypi/informix-db/json`). The legacy `informixdb` is taken (HTTP 200), `informix` is also free (404) but too generic.
**Why**: Discoverability balanced with brevity. Anyone searching PyPI for "informix" finds it; the hyphen distinguishes it from the legacy C-extension wrapper.
---
## 2026-05-02 — License
**Status**: active
**Decision**: MIT
**Discarded**: Apache-2.0 (more defensive but less common in Python ecosystem), BSD-3-Clause
**Why**: Simplest, most permissive, ecosystem-standard for Python libraries.
---
## 2026-05-02 — Sync first; async deferred
**Status**: active
**Decision**: Build a sync, blocking-socket implementation. Async lands in Phase 6+ as a separate `informix_db.aio` subpackage following asyncpg's I/O-agnostic-protocol pattern.
**Why**: Wire protocols are hard enough; debugging protocol bugs through asyncio plumbing is two layers of indirection too many. Sync-first means we can test against blocking sockets, prove correctness, then mechanically swap the I/O layer.
---
## 2026-05-02 — Test target
**Status**: active
**Decision**: `icr.io/informix/informix-developer-database` (the Developer Edition image, now maintained by HCL Software since the 2017 IBM→HCL transfer of Informix), port 9088 (native SQLI).
**Pinned digest** (captured 2026-05-02 from `docker pull`):
`sha256:8202d69ba5674df4b13140d5121dd11b7b26b28dc60119b7e8f87e533e538ba1`
**On-disk footprint**: 2.23 GB unpacked / 665 MB compressed.
**Default credentials** (from container startup logs, accept-license run):
- OS/DB user: `informix`
- Password: `in4mix`
- HQ admin password: `Passw0rd` (don't need this)
- DBA user/password: empty
- DBSERVERNAME: defaults to `informix` (same as the user)
- TLS_CONNECTIONS: OFF (plain auth on port 9088)
- Always-present databases: `sysmaster`, `sysuser` (built during init)
**Container startup**: `docker run -d --name ifx --privileged -p 9088:9088 -e LICENSE=accept -e SIZE=small icr.io/informix/informix-developer-database@sha256:8202d69b...`
**Why**: Free, official, no license click-through, supports plain-password auth out of the box. The digest is locked from Phase 0 onward — `:latest` is the canonical source of flaky integration suites in DB-driver projects, so all `docker-compose.yml` files reference the digest, never the tag.
---
## 2026-05-02 — Phase 0 is a gate, not a step
**Status**: active
**Decision**: No library code is written until `PROTOCOL_NOTES.md` meets all four exit criteria:
1. Login byte layout documented end-to-end
2. Message-type tags identified for login/execute/row/end-of-result/error/disconnect
3. `SELECT 1` round-trip fully labeled
4. JDBC source and packet capture corroborate on login + execute paths
If exit criteria can't be met within bounded effort, invoke the off-ramp.
**Why**: Most greenfield projects fail by writing code before they understand the problem. This project has an undocumented wire protocol as its central unknown. Gating on Phase 0 means a failed spike still produces a publicly valuable artifact (`PROTOCOL_NOTES.md`) instead of a half-built driver.
---
## 2026-05-02 — Phase 1 architecture decisions (locked at start of Phase 1)
> These are pre-decided so paramstyle/Python-floor/autocommit don't churn later. Recorded here so Phase 1 doesn't relitigate them.
- **`paramstyle = "numeric"`** (`:1`, `:2`, …). Matches Informix ESQL/C convention.
- **Python ≥ 3.10**. Gives us `match`, modern type hints, `tomllib`.
- **`autocommit` defaults to off**. PEP 249 implicit semantics; opt-in via `connect(autocommit=True)`.
- **Author**: Ryan Malloy `<ryan@supported.systems>` (per global pyproject.toml convention).
- **Versioning**: CalVer `YYYY.MM.DD` (`2026.05.02` initial); same-day fixes use PEP 440 post-release `2026.05.02.1`, `.2`, etc.
---
## 2026-05-02 — DATE pulled forward to MVP
**Status**: active
**Decision**: DATE is included in the Phase 2 MVP type set, alongside SMALLINT/INTEGER/BIGINT/FLOAT/CHAR/VARCHAR/BOOLEAN.
**Discarded**: leaving DATE in the "medium" / Phase 6 bucket.
**Why**: Almost no real Informix database is DATE-free. The encoding is trivial once the type code is known (4-byte day count from the Informix epoch 1899-12-31). Cheap to include; expensive to leave out.
DATETIME / INTERVAL / DECIMAL / NUMERIC / MONEY remain in Phase 6+ — their encodings (qualifier-byte precision, BCD-style packed decimal) are non-trivial.
---
## 2026-05-02 — `CLAUDE.md` excluded from git and sdist
**Status**: active
**Decision**: `.gitignore` excludes `CLAUDE.md`. Once `pyproject.toml` exists, `[tool.hatch.build.targets.sdist].exclude` will also list `CLAUDE.md`.
**Why**: `CLAUDE.md` contains the user's email and operator-private context. Per global convention, only commit `CLAUDE.md` to private repos. This project is destined for PyPI / public Git.
---
## 2026-05-02 — JDBC reference: `ifxjdbc.jar` 4.50.JC10
**Status**: active
**Decision**: Use the user-provided `ifxjdbc.jar` from `/home/rpm/bingham/rtmt/lib/` as the JDBC reference, working copy at `build/ifxjdbc.jar`.
**JAR identity**: `Implementation-Version: 4.50.10-SNAPSHOT`, build 146, dated 2023-03-07. Printable version string: `4.50.JC10`. SHA256 `dc5622cb4e95678d15836b684b6ef1783d37bc0cdd2725208577fc300df4e5f1`.
**Discarded**: Maven Central `com.ibm.informix:jdbc:4.50.4.1` (not downloaded — the local copy is newer).
**Why**: A newer reference is strictly better — the wire protocol is backwards-compatible, so anything `4.50.JC10` knows how to send/receive will be accepted by older servers. Avoids the Maven download.
---
## 2026-05-02 — Decompiler: CFR 0.152
**Status**: active
**Decision**: Use CFR 0.152 (https://github.com/leibnitz27/cfr) as the JDBC decompiler. Cached at `build/tools/cfr.jar`.
**Discarded**: Procyon, Fernflower, Ghidra (Ghidra MCP port pool was exhausted; CFR alone proved sufficient).
**Why**: CFR produces the most readable Java for modern bytecode, ships as a single fat JAR, has no install step. Decompiles 478 .java files in seconds.
---
## 2026-05-02 — Confirmed: CSM is dead in modern Informix
**Status**: active
**Decision**: Do NOT plan for CSM (Communications Support Module) support. Ever.
**Evidence**: `com.informix.asf.Connection.getOptProperties()` (decompiled) literally throws: `"CSM Encryption is no longer supported"` if `SECURITY` or `CSM` opt-prop is set.
**Why**: This used to be the supplied-encryption-plugin layer. IBM removed it; modern Informix uses TLS/SSL exclusively. Removes CSM from every phase plan.
---
## 2026-05-02 — Wire framing primitives confirmed (from JDBC)
**Status**: active (pending PCAP corroboration)
**Decision**: Adopt these wire-framing primitives in `_protocol.py` from day one:
- All multi-byte integers are **big-endian** (network byte order)
- SmallInt = 2 bytes, Int = 4 bytes, BigInt = 8 bytes, Real = 4 bytes IEEE 754, Double = 8 bytes IEEE 754
- Variable-length payloads (string, decimal, datetime, interval, BLOB): `[short length][bytes][optional 0x00 pad if length is odd]`**the 16-bit alignment requirement is mandatory; missing it desynchronizes the parser**
- Strings emitted as `[short len+1][bytes][0x00 nul terminator]` (the +1 is the trailing nul)
- Post-login messages have NO header: each is `[short messageType][payload]` and the next message begins immediately after the previous one's payload ends
- Login PDU has its own SLheader (6 bytes) + PFheader structure
**Source**: `com.informix.lang.JavaToIfxType` (encoders), `com.informix.asf.IfxDataInputStream`/`IfxDataOutputStream` (framing), `com.informix.asf.Connection` (login PDU). Documented byte-by-byte in `PROTOCOL_NOTES.md`.
---
## 2026-05-02 — Plain-password auth: no challenge-response round trip
**Status**: active
**Decision**: For MVP, treat plain-password auth as a single round trip: client sends one binary login PDU containing the password inline; server replies with one PDU containing version + capabilities or an error block.
**Why**: `Connection.encodeAscBinary()` writes the password as a length-prefixed string within the login PDU body. There is no separate auth phase, no salt, no hashing, no `SQ_CHALLENGE`/`SQ_RESPONSE` exchange. Those constants (129/130) are reserved for PAM and other interactive auth methods, used AFTER the binary login PDU when the server initiates them.
---
## (template — copy below this line for new entries)
```
## YYYY-MM-DD — <one-line decision title>
**Status**: active | superseded | revisited
**Decision**: <chosen path>
**Discarded**: <alternatives, briefly>
**Why**: <rationale>
```