informix-db/docs/DECISION_LOG.md
Ryan Malloy 34ad04a872 Phase 2.x: VARCHAR row decoding works — three byte-level fixes
Three findings, each caught by a different debugging technique,
documented in DECISION_LOG.md:

1. CURNAME+NFETCH PDU: trailing reserved field is SHORT not INT.
   Caught by byte-diffing our 44-byte PDU against JDBC's 42-byte
   reference under socat. The server tolerated the longer version
   for INT-only SELECTs (silently consuming extra zeros) but
   rejected it for VARCHAR queries. Lesson: server tolerance varies
   by query type — always match JDBC byte-for-byte.

2. SQ_TUPLE payload pads to even byte alignment. An 11-byte
   "syscolumns" VARCHAR payload had a trailing 0x00 between it and
   the next SQ_TUPLE tag. JDBC's IfxRowColumn.readTuple consumes
   this pad silently; we weren't, so any odd-length variable-width
   row desynced the parser.

3. VARCHAR/NCHAR/NVCHAR in tuple data use a SINGLE-byte length
   prefix (max 255 chars — IDS VARCHAR's hard limit). NOT a 2-byte
   short as I'd initially assumed. CHAR is fixed-width per
   encoded_length. LVARCHAR uses a 4-byte int prefix for >255 byte
   values.

Module changes:
  src/informix_db/_resultset.py — _LENGTH_PREFIXED_SHORT_TYPES set,
    branched VARCHAR/NCHAR/NVCHAR (1-byte prefix) vs CHAR (fixed)
    vs LVARCHAR (4-byte prefix); even-byte alignment pad consumed
    after each SQ_TUPLE payload.
  src/informix_db/cursors.py — CURNAME+NFETCH and standalone NFETCH
    PDUs now write_short(0) for the reserved trailing field.

Tests: 40 unit + 18 integration (3 new VARCHAR tests) = 58 total,
all green, ruff clean. New tests cover:
  - VARCHAR single-column SELECT
  - Odd-length VARCHAR row (regression for the pad-byte bug)
  - Mixed INT + VARCHAR + FLOAT three-column SELECT

Sample output:
  SELECT FIRST 5 tabname FROM systables → ('systables',),
    ('syscolumns',), ('sysindices',), ('systabauth',), ('syscolauth',)
  SELECT FIRST 3 tabname, tabid, nrows → ('systables', 1, 276.0), ...

VARCHAR was the last known gap from the Phase 2 commit. Phase 2
now reads INT, BIGINT, REAL, FLOAT, CHAR, VARCHAR end-to-end. Phase
6+ types (DATETIME, INTERVAL, DECIMAL, BLOBs) remain.
2026-05-04 07:55:13 -06:00

13 KiB

Decision Log

Running rationale for protocol, auth, type, and architecture decisions made during the project. New decisions append; old ones are amended (with date) rather than overwritten.

Format: every decision has a date, a status (active / superseded / revisited), the chosen path, the discarded alternatives, and the why.


2026-05-02 — Project goal & off-ramp

Status: active Decision: Build a pure-Python implementation of the SQLI wire protocol. No IBM Client SDK. No JVM. No native libraries. Off-ramp (chosen by user during planning): if Phase 0 reveals the protocol is intractable in pure Python — e.g., mandatory undocumented crypto in the handshake — narrow scope (lock to one server version, drop async, drop prepared statements if needed) and stay pure-Python. Do not fall back to JPype/JDBC; that defeats the project's purpose. Why: The "no SDK / no JVM" goal is what makes this driver valuable. A JPype fallback would ship something that works but solves nothing the existing JDBC-via-JPype solution doesn't already solve.


2026-05-02 — Package name

Status: active Decision: informix-db Discarded: informixdb-pure (longer), ifxsqli (less discoverable), pyifx (obscure) PyPI availability: confirmed available 2026-05-02 (HTTP 404 on /pypi/informix-db/json). The legacy informixdb is taken (HTTP 200), informix is also free (404) but too generic. Why: Discoverability balanced with brevity. Anyone searching PyPI for "informix" finds it; the hyphen distinguishes it from the legacy C-extension wrapper.


2026-05-02 — License

Status: active Decision: MIT Discarded: Apache-2.0 (more defensive but less common in Python ecosystem), BSD-3-Clause Why: Simplest, most permissive, ecosystem-standard for Python libraries.


2026-05-02 — Sync first; async deferred

Status: active Decision: Build a sync, blocking-socket implementation. Async lands in Phase 6+ as a separate informix_db.aio subpackage following asyncpg's I/O-agnostic-protocol pattern. Why: Wire protocols are hard enough; debugging protocol bugs through asyncio plumbing is two layers of indirection too many. Sync-first means we can test against blocking sockets, prove correctness, then mechanically swap the I/O layer.


2026-05-02 — Test target

Status: active Decision: icr.io/informix/informix-developer-database (the Developer Edition image, now maintained by HCL Software since the 2017 IBM→HCL transfer of Informix), port 9088 (native SQLI). Pinned digest (captured 2026-05-02 from docker pull): sha256:8202d69ba5674df4b13140d5121dd11b7b26b28dc60119b7e8f87e533e538ba1 On-disk footprint: 2.23 GB unpacked / 665 MB compressed. Default credentials (from container startup logs, accept-license run):

  • OS/DB user: informix
  • Password: in4mix
  • HQ admin password: Passw0rd (don't need this)
  • DBA user/password: empty
  • DBSERVERNAME: defaults to informix (same as the user)
  • TLS_CONNECTIONS: OFF (plain auth on port 9088)
  • Always-present databases: sysmaster, sysuser (built during init) Container startup: docker run -d --name ifx --privileged -p 9088:9088 -e LICENSE=accept -e SIZE=small icr.io/informix/informix-developer-database@sha256:8202d69b... Why: Free, official, no license click-through, supports plain-password auth out of the box. The digest is locked from Phase 0 onward — :latest is the canonical source of flaky integration suites in DB-driver projects, so all docker-compose.yml files reference the digest, never the tag.

2026-05-02 — Phase 0 is a gate, not a step

Status: active Decision: No library code is written until PROTOCOL_NOTES.md meets all four exit criteria:

  1. Login byte layout documented end-to-end
  2. Message-type tags identified for login/execute/row/end-of-result/error/disconnect
  3. SELECT 1 round-trip fully labeled
  4. JDBC source and packet capture corroborate on login + execute paths

If exit criteria can't be met within bounded effort, invoke the off-ramp.

Why: Most greenfield projects fail by writing code before they understand the problem. This project has an undocumented wire protocol as its central unknown. Gating on Phase 0 means a failed spike still produces a publicly valuable artifact (PROTOCOL_NOTES.md) instead of a half-built driver.


2026-05-02 — Phase 1 architecture decisions (locked at start of Phase 1)

These are pre-decided so paramstyle/Python-floor/autocommit don't churn later. Recorded here so Phase 1 doesn't relitigate them.

  • paramstyle = "numeric" (:1, :2, …). Matches Informix ESQL/C convention.
  • Python ≥ 3.10. Gives us match, modern type hints, tomllib.
  • autocommit defaults to off. PEP 249 implicit semantics; opt-in via connect(autocommit=True).
  • Author: Ryan Malloy <ryan@supported.systems> (per global pyproject.toml convention).
  • Versioning: CalVer YYYY.MM.DD (2026.05.02 initial); same-day fixes use PEP 440 post-release 2026.05.02.1, .2, etc.

2026-05-02 — DATE pulled forward to MVP

Status: active Decision: DATE is included in the Phase 2 MVP type set, alongside SMALLINT/INTEGER/BIGINT/FLOAT/CHAR/VARCHAR/BOOLEAN. Discarded: leaving DATE in the "medium" / Phase 6 bucket. Why: Almost no real Informix database is DATE-free. The encoding is trivial once the type code is known (4-byte day count from the Informix epoch 1899-12-31). Cheap to include; expensive to leave out.

DATETIME / INTERVAL / DECIMAL / NUMERIC / MONEY remain in Phase 6+ — their encodings (qualifier-byte precision, BCD-style packed decimal) are non-trivial.


2026-05-02 — CLAUDE.md excluded from git and sdist

Status: active Decision: .gitignore excludes CLAUDE.md. Once pyproject.toml exists, [tool.hatch.build.targets.sdist].exclude will also list CLAUDE.md. Why: CLAUDE.md contains the user's email and operator-private context. Per global convention, only commit CLAUDE.md to private repos. This project is destined for PyPI / public Git.


2026-05-02 — JDBC reference: ifxjdbc.jar 4.50.JC10

Status: active Decision: Use the user-provided ifxjdbc.jar from /home/rpm/bingham/rtmt/lib/ as the JDBC reference, working copy at build/ifxjdbc.jar. JAR identity: Implementation-Version: 4.50.10-SNAPSHOT, build 146, dated 2023-03-07. Printable version string: 4.50.JC10. SHA256 dc5622cb4e95678d15836b684b6ef1783d37bc0cdd2725208577fc300df4e5f1. Discarded: Maven Central com.ibm.informix:jdbc:4.50.4.1 (not downloaded — the local copy is newer). Why: A newer reference is strictly better — the wire protocol is backwards-compatible, so anything 4.50.JC10 knows how to send/receive will be accepted by older servers. Avoids the Maven download.


2026-05-02 — Decompiler: CFR 0.152

Status: active Decision: Use CFR 0.152 (https://github.com/leibnitz27/cfr) as the JDBC decompiler. Cached at build/tools/cfr.jar. Discarded: Procyon, Fernflower, Ghidra (Ghidra MCP port pool was exhausted; CFR alone proved sufficient). Why: CFR produces the most readable Java for modern bytecode, ships as a single fat JAR, has no install step. Decompiles 478 .java files in seconds.


2026-05-02 — Confirmed: CSM is dead in modern Informix

Status: active Decision: Do NOT plan for CSM (Communications Support Module) support. Ever. Evidence: com.informix.asf.Connection.getOptProperties() (decompiled) literally throws: "CSM Encryption is no longer supported" if SECURITY or CSM opt-prop is set. Why: This used to be the supplied-encryption-plugin layer. IBM removed it; modern Informix uses TLS/SSL exclusively. Removes CSM from every phase plan.


2026-05-02 — Wire framing primitives confirmed (from JDBC)

Status: active (pending PCAP corroboration) Decision: Adopt these wire-framing primitives in _protocol.py from day one:

  • All multi-byte integers are big-endian (network byte order)
  • SmallInt = 2 bytes, Int = 4 bytes, BigInt = 8 bytes, Real = 4 bytes IEEE 754, Double = 8 bytes IEEE 754
  • Variable-length payloads (string, decimal, datetime, interval, BLOB): [short length][bytes][optional 0x00 pad if length is odd]the 16-bit alignment requirement is mandatory; missing it desynchronizes the parser
  • Strings emitted as [short len+1][bytes][0x00 nul terminator] (the +1 is the trailing nul)
  • Post-login messages have NO header: each is [short messageType][payload] and the next message begins immediately after the previous one's payload ends
  • Login PDU has its own SLheader (6 bytes) + PFheader structure Source: com.informix.lang.JavaToIfxType (encoders), com.informix.asf.IfxDataInputStream/IfxDataOutputStream (framing), com.informix.asf.Connection (login PDU). Documented byte-by-byte in PROTOCOL_NOTES.md.

2026-05-02 — Plain-password auth: no challenge-response round trip

Status: active Decision: For MVP, treat plain-password auth as a single round trip: client sends one binary login PDU containing the password inline; server replies with one PDU containing version + capabilities or an error block. Why: Connection.encodeAscBinary() writes the password as a length-prefixed string within the login PDU body. There is no separate auth phase, no salt, no hashing, no SQ_CHALLENGE/SQ_RESPONSE exchange. Those constants (129/130) are reserved for PAM and other interactive auth methods, used AFTER the binary login PDU when the server initiates them.


2026-05-02 — Capability ints: corrected after PDU diff caught misread

Status: active (corrects an earlier same-day entry) Decision: Send Cap_1 = 0x0000013c, Cap_2 = 0, Cap_3 = 0 in the binary login PDU. These are the values IBM's JDBC driver sends; the server echoes them back identically. Why this is a correction: An earlier read of the wire bytes (before we wrote the byte-for-byte PDU diff) decoded the capability section as Cap_1=1, Cap_2=0x3c000000, Cap_3=0. That was a misalignment — the 0x3c byte interpreted as Cap_2's high byte was actually Cap_1's low byte. Real layout: a single int 0x0000013c = (capability_class << 8) | PF_PROT_SQLI_0600 (60 = 0x3c). How we caught it: tests/test_pdu_match.py — captures our generated PDU via a monkey-patched socket and asserts byte-for-byte equality against docs/CAPTURES/01-connect-only.socat.log for offsets 2..280 (the structural prefix). The connection still worked with the wrong values because the dev image is permissive, but the PDU was structurally non-identical. Server-accepts ≠ structurally-correct. Methodology takeaway: For wire-protocol implementations, always diff against the reference vendor's PDU bytes, not just "it connected." Permissive servers mask real bugs.


2026-05-04 — VARCHAR row decoding: three byte-level discoveries

Status: active Decision: parse_tuple_payload now handles VARCHAR/NCHAR/NVCHAR with a single-byte length prefix; SQ_TUPLE payloads are padded to even byte alignment; the trailing reserved field in CURNAME+NFETCH is a SHORT not an INT. Why this is three findings: each one was caught by a different debugging technique:

  1. CURNAME+NFETCH PDU off by 2 bytes: my reserved trailing field was write_int(0) (4 bytes); JDBC's reference is write_short(0) (2 bytes). Caught by capturing both PDUs under socat and byte-diffing — our 44-byte vs JDBC's 42-byte. The server happened to accept the longer version for INT-only SELECTs (silently treating the extra zeros as padding) but rejected it for VARCHAR queries. Lesson: server tolerance varies by query type — always match JDBC byte-for-byte.

  2. SQ_TUPLE payload pads to even alignment: when size is odd, an extra 0x00 byte follows the payload before the next tag. Found in docs/CAPTURES/15-py-varchar-fixed.socat.log — an 11-byte "syscolumns" VARCHAR payload had a trailing 0x00 that JDBC's IfxRowColumn.readTuple consumes silently. We weren't doing this, so the parser desynced for any odd-length variable-width row. Even-byte alignment is a wire-protocol-wide invariant — every variable-length payload pads.

  3. VARCHAR in tuple uses 1-byte length prefix, NOT 2: per the on-wire encoding (verified empirically in capture 15), VARCHAR values in row data are [byte length][bytes] — single-byte prefix, max 255 chars. NCHAR and NVCHAR follow the same pattern. (CHAR is fixed-width per encoded_length, no length prefix at all.) LVARCHAR uses a 4-byte int prefix for values >255 bytes.

How to apply: when adding new variable-width type decoders, capture a tuple under socat first to see the exact framing — don't infer from the column descriptor's encoded_length, which is the MAX storage, not the wire format. The wire format may differ by orders of magnitude (1-byte prefix vs encoded_length=128 for VARCHAR).


(template — copy below this line for new entries)

## YYYY-MM-DD — <one-line decision title>

**Status**: active | superseded | revisited
**Decision**: <chosen path>
**Discarded**: <alternatives, briefly>
**Why**: <rationale>