informix-db/docs/DECISION_LOG.md
Ryan Malloy 888b8079d3 Phase 6.e: INTERVAL parameter encoding
Implements encoders for datetime.timedelta → INTERVAL DAY(9) TO FRACTION(5)
and IntervalYM → INTERVAL YEAR(9) TO MONTH. Both follow the 2-byte-length-
prefixed BCD wire format established in Phase 6.c (DECIMAL/DATETIME).

The default qualifier choice is generous: DAY(9) covers any timedelta,
YEAR(9) handles ±1B years. JDBC defaults to smaller widths (DAY(2)/YEAR(4))
trading safety for compactness — we make the opposite trade.

FRACTION(5) is the Informix precision ceiling — sub-10us intervals can't
round-trip cleanly. Same limitation JDBC has.

Six integration tests, all green on first run against live Informix —
the synthetic round-trip in the test framework caught every framing bug
locally, before integration tests even started. This is the dividend from
owning both decoder and encoder.

Total: 53 unit + 88 integration = 141 tests.

Type matrix update: INTERVAL now has both decode + encode. Only BLOB/CLOB
and BYTE/TEXT remain among the common types.
2026-05-04 12:30:48 -06:00

482 lines
32 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# Decision Log
Running rationale for protocol, auth, type, and architecture decisions made during the project. New decisions append; old ones are *amended* (with date) rather than overwritten.
Format: every decision has a date, a status (`active` / `superseded` / `revisited`), the chosen path, the discarded alternatives, and the *why*.
---
## 2026-05-02 — Project goal & off-ramp
**Status**: active
**Decision**: Build a pure-Python implementation of the SQLI wire protocol. No IBM Client SDK. No JVM. No native libraries.
**Off-ramp** (chosen by user during planning): if Phase 0 reveals the protocol is intractable in pure Python — e.g., mandatory undocumented crypto in the handshake — narrow scope (lock to one server version, drop async, drop prepared statements if needed) and stay pure-Python. Do **not** fall back to JPype/JDBC; that defeats the project's purpose.
**Why**: The "no SDK / no JVM" goal is what makes this driver valuable. A JPype fallback would ship something that works but solves nothing the existing JDBC-via-JPype solution doesn't already solve.
---
## 2026-05-02 — Package name
**Status**: active
**Decision**: `informix-db`
**Discarded**: `informixdb-pure` (longer), `ifxsqli` (less discoverable), `pyifx` (obscure)
**PyPI availability**: confirmed available 2026-05-02 (HTTP 404 on `/pypi/informix-db/json`). The legacy `informixdb` is taken (HTTP 200), `informix` is also free (404) but too generic.
**Why**: Discoverability balanced with brevity. Anyone searching PyPI for "informix" finds it; the hyphen distinguishes it from the legacy C-extension wrapper.
---
## 2026-05-02 — License
**Status**: active
**Decision**: MIT
**Discarded**: Apache-2.0 (more defensive but less common in Python ecosystem), BSD-3-Clause
**Why**: Simplest, most permissive, ecosystem-standard for Python libraries.
---
## 2026-05-02 — Sync first; async deferred
**Status**: active
**Decision**: Build a sync, blocking-socket implementation. Async lands in Phase 6+ as a separate `informix_db.aio` subpackage following asyncpg's I/O-agnostic-protocol pattern.
**Why**: Wire protocols are hard enough; debugging protocol bugs through asyncio plumbing is two layers of indirection too many. Sync-first means we can test against blocking sockets, prove correctness, then mechanically swap the I/O layer.
---
## 2026-05-02 — Test target
**Status**: active
**Decision**: `icr.io/informix/informix-developer-database` (the Developer Edition image, now maintained by HCL Software since the 2017 IBM→HCL transfer of Informix), port 9088 (native SQLI).
**Pinned digest** (captured 2026-05-02 from `docker pull`):
`sha256:8202d69ba5674df4b13140d5121dd11b7b26b28dc60119b7e8f87e533e538ba1`
**On-disk footprint**: 2.23 GB unpacked / 665 MB compressed.
**Default credentials** (from container startup logs, accept-license run):
- OS/DB user: `informix`
- Password: `in4mix`
- HQ admin password: `Passw0rd` (don't need this)
- DBA user/password: empty
- DBSERVERNAME: defaults to `informix` (same as the user)
- TLS_CONNECTIONS: OFF (plain auth on port 9088)
- Always-present databases: `sysmaster`, `sysuser` (built during init)
**Container startup**: `docker run -d --name ifx --privileged -p 9088:9088 -e LICENSE=accept -e SIZE=small icr.io/informix/informix-developer-database@sha256:8202d69b...`
**Why**: Free, official, no license click-through, supports plain-password auth out of the box. The digest is locked from Phase 0 onward — `:latest` is the canonical source of flaky integration suites in DB-driver projects, so all `docker-compose.yml` files reference the digest, never the tag.
---
## 2026-05-02 — Phase 0 is a gate, not a step
**Status**: active
**Decision**: No library code is written until `PROTOCOL_NOTES.md` meets all four exit criteria:
1. Login byte layout documented end-to-end
2. Message-type tags identified for login/execute/row/end-of-result/error/disconnect
3. `SELECT 1` round-trip fully labeled
4. JDBC source and packet capture corroborate on login + execute paths
If exit criteria can't be met within bounded effort, invoke the off-ramp.
**Why**: Most greenfield projects fail by writing code before they understand the problem. This project has an undocumented wire protocol as its central unknown. Gating on Phase 0 means a failed spike still produces a publicly valuable artifact (`PROTOCOL_NOTES.md`) instead of a half-built driver.
---
## 2026-05-02 — Phase 1 architecture decisions (locked at start of Phase 1)
> These are pre-decided so paramstyle/Python-floor/autocommit don't churn later. Recorded here so Phase 1 doesn't relitigate them.
- **`paramstyle = "numeric"`** (`:1`, `:2`, …). Matches Informix ESQL/C convention.
- **Python ≥ 3.10**. Gives us `match`, modern type hints, `tomllib`.
- **`autocommit` defaults to off**. PEP 249 implicit semantics; opt-in via `connect(autocommit=True)`.
- **Author**: Ryan Malloy `<ryan@supported.systems>` (per global pyproject.toml convention).
- **Versioning**: CalVer `YYYY.MM.DD` (`2026.05.02` initial); same-day fixes use PEP 440 post-release `2026.05.02.1`, `.2`, etc.
---
## 2026-05-02 — DATE pulled forward to MVP
**Status**: active
**Decision**: DATE is included in the Phase 2 MVP type set, alongside SMALLINT/INTEGER/BIGINT/FLOAT/CHAR/VARCHAR/BOOLEAN.
**Discarded**: leaving DATE in the "medium" / Phase 6 bucket.
**Why**: Almost no real Informix database is DATE-free. The encoding is trivial once the type code is known (4-byte day count from the Informix epoch 1899-12-31). Cheap to include; expensive to leave out.
DATETIME / INTERVAL / DECIMAL / NUMERIC / MONEY remain in Phase 6+ — their encodings (qualifier-byte precision, BCD-style packed decimal) are non-trivial.
---
## 2026-05-02 — `CLAUDE.md` excluded from git and sdist
**Status**: active
**Decision**: `.gitignore` excludes `CLAUDE.md`. Once `pyproject.toml` exists, `[tool.hatch.build.targets.sdist].exclude` will also list `CLAUDE.md`.
**Why**: `CLAUDE.md` contains the user's email and operator-private context. Per global convention, only commit `CLAUDE.md` to private repos. This project is destined for PyPI / public Git.
---
## 2026-05-02 — JDBC reference: `ifxjdbc.jar` 4.50.JC10
**Status**: active
**Decision**: Use the user-provided `ifxjdbc.jar` from `/home/rpm/bingham/rtmt/lib/` as the JDBC reference, working copy at `build/ifxjdbc.jar`.
**JAR identity**: `Implementation-Version: 4.50.10-SNAPSHOT`, build 146, dated 2023-03-07. Printable version string: `4.50.JC10`. SHA256 `dc5622cb4e95678d15836b684b6ef1783d37bc0cdd2725208577fc300df4e5f1`.
**Discarded**: Maven Central `com.ibm.informix:jdbc:4.50.4.1` (not downloaded — the local copy is newer).
**Why**: A newer reference is strictly better — the wire protocol is backwards-compatible, so anything `4.50.JC10` knows how to send/receive will be accepted by older servers. Avoids the Maven download.
---
## 2026-05-02 — Decompiler: CFR 0.152
**Status**: active
**Decision**: Use CFR 0.152 (https://github.com/leibnitz27/cfr) as the JDBC decompiler. Cached at `build/tools/cfr.jar`.
**Discarded**: Procyon, Fernflower, Ghidra (Ghidra MCP port pool was exhausted; CFR alone proved sufficient).
**Why**: CFR produces the most readable Java for modern bytecode, ships as a single fat JAR, has no install step. Decompiles 478 .java files in seconds.
---
## 2026-05-02 — Confirmed: CSM is dead in modern Informix
**Status**: active
**Decision**: Do NOT plan for CSM (Communications Support Module) support. Ever.
**Evidence**: `com.informix.asf.Connection.getOptProperties()` (decompiled) literally throws: `"CSM Encryption is no longer supported"` if `SECURITY` or `CSM` opt-prop is set.
**Why**: This used to be the supplied-encryption-plugin layer. IBM removed it; modern Informix uses TLS/SSL exclusively. Removes CSM from every phase plan.
---
## 2026-05-02 — Wire framing primitives confirmed (from JDBC)
**Status**: active (pending PCAP corroboration)
**Decision**: Adopt these wire-framing primitives in `_protocol.py` from day one:
- All multi-byte integers are **big-endian** (network byte order)
- SmallInt = 2 bytes, Int = 4 bytes, BigInt = 8 bytes, Real = 4 bytes IEEE 754, Double = 8 bytes IEEE 754
- Variable-length payloads (string, decimal, datetime, interval, BLOB): `[short length][bytes][optional 0x00 pad if length is odd]`**the 16-bit alignment requirement is mandatory; missing it desynchronizes the parser**
- Strings emitted as `[short len+1][bytes][0x00 nul terminator]` (the +1 is the trailing nul)
- Post-login messages have NO header: each is `[short messageType][payload]` and the next message begins immediately after the previous one's payload ends
- Login PDU has its own SLheader (6 bytes) + PFheader structure
**Source**: `com.informix.lang.JavaToIfxType` (encoders), `com.informix.asf.IfxDataInputStream`/`IfxDataOutputStream` (framing), `com.informix.asf.Connection` (login PDU). Documented byte-by-byte in `PROTOCOL_NOTES.md`.
---
## 2026-05-02 — Plain-password auth: no challenge-response round trip
**Status**: active
**Decision**: For MVP, treat plain-password auth as a single round trip: client sends one binary login PDU containing the password inline; server replies with one PDU containing version + capabilities or an error block.
**Why**: `Connection.encodeAscBinary()` writes the password as a length-prefixed string within the login PDU body. There is no separate auth phase, no salt, no hashing, no `SQ_CHALLENGE`/`SQ_RESPONSE` exchange. Those constants (129/130) are reserved for PAM and other interactive auth methods, used AFTER the binary login PDU when the server initiates them.
---
## 2026-05-02 — Capability ints: corrected after PDU diff caught misread
**Status**: active (corrects an earlier same-day entry)
**Decision**: Send `Cap_1 = 0x0000013c, Cap_2 = 0, Cap_3 = 0` in the binary login PDU. These are the values IBM's JDBC driver sends; the server echoes them back identically.
**Why this is a correction**: An earlier read of the wire bytes (before we wrote the byte-for-byte PDU diff) decoded the capability section as `Cap_1=1, Cap_2=0x3c000000, Cap_3=0`. That was a misalignment — the `0x3c` byte interpreted as `Cap_2`'s high byte was actually `Cap_1`'s low byte. Real layout: a single int `0x0000013c` = `(capability_class << 8) | PF_PROT_SQLI_0600 (60 = 0x3c)`.
**How we caught it**: `tests/test_pdu_match.py` — captures our generated PDU via a monkey-patched socket and asserts byte-for-byte equality against `docs/CAPTURES/01-connect-only.socat.log` for offsets 2..280 (the structural prefix). The connection still worked with the wrong values because the dev image is permissive, but the PDU was structurally non-identical. **Server-accepts ≠ structurally-correct.**
**Methodology takeaway**: For wire-protocol implementations, always diff against the reference vendor's PDU bytes, not just "it connected." Permissive servers mask real bugs.
---
## 2026-05-04 — VARCHAR row decoding: three byte-level discoveries
**Status**: active
**Decision**: ``parse_tuple_payload`` now handles VARCHAR/NCHAR/NVCHAR with a single-byte length prefix; SQ_TUPLE payloads are padded to even byte alignment; the trailing reserved field in CURNAME+NFETCH is a SHORT not an INT.
**Why this is three findings**: each one was caught by a different debugging technique:
1. **CURNAME+NFETCH PDU off by 2 bytes**: my reserved trailing field was `write_int(0)` (4 bytes); JDBC's reference is `write_short(0)` (2 bytes). Caught by capturing both PDUs under socat and byte-diffing — our 44-byte vs JDBC's 42-byte. The server happened to accept the longer version for INT-only SELECTs (silently treating the extra zeros as padding) but rejected it for VARCHAR queries. Lesson: **server tolerance varies by query type — always match JDBC byte-for-byte**.
2. **SQ_TUPLE payload pads to even alignment**: when `size` is odd, an extra 0x00 byte follows the payload before the next tag. Found in `docs/CAPTURES/15-py-varchar-fixed.socat.log` — an 11-byte "syscolumns" VARCHAR payload had a trailing `0x00` that JDBC's `IfxRowColumn.readTuple` consumes silently. We weren't doing this, so the parser desynced for any odd-length variable-width row. **Even-byte alignment is a wire-protocol-wide invariant — every variable-length payload pads.**
3. **VARCHAR in tuple uses 1-byte length prefix, NOT 2**: per the on-wire encoding (verified empirically in capture 15), VARCHAR values in row data are `[byte length][bytes]` — single-byte prefix, max 255 chars. NCHAR and NVCHAR follow the same pattern. (CHAR is fixed-width per encoded_length, no length prefix at all.) LVARCHAR uses a 4-byte int prefix for values >255 bytes.
**How to apply**: when adding new variable-width type decoders, capture a tuple under socat first to see the exact framing — don't infer from the column descriptor's `encoded_length`, which is the MAX storage, not the wire format. The wire format may differ by orders of magnitude (1-byte prefix vs encoded_length=128 for VARCHAR).
---
## 2026-05-04 — DML / DDL execution path: SQ_PREPARE + SQ_EXECUTE + SQ_RELEASE
**Status**: active
**Decision**: For statements that don't return rows (CREATE, INSERT, UPDATE, DELETE, DROP), Cursor.execute branches on ``nfields == 0`` in the DESCRIBE response. SELECT path is the cursor lifecycle (CURNAME+NFETCH+...); DDL/DML path is just SQ_EXECUTE then SQ_RELEASE.
**Why**: JDBC uses SQ_PREPARE for everything; for non-SELECT it just doesn't open a cursor. Per IfxSqli.sendExecute (line 1075): non-prepared-statement execute is a bare ``[short SQ_ID=4][int SQ_EXECUTE=7][short SQ_EOT]`` (8 bytes).
---
## 2026-05-04 — SQ_INSERTDONE (=94) is execution metadata, NOT execution
**Status**: active
**Decision**: SQ_INSERTDONE arrives in BOTH the DESCRIBE response (PREPARE phase) AND the EXECUTE response for literal-value INSERTs. It carries the auto-generated serial values that WILL be / WERE inserted. Don't interpret SQ_INSERTDONE in the DESCRIBE response as "row was inserted" — it's just metadata. Always send SQ_EXECUTE.
**Why this was a debugging trap**: when I first saw SQ_INSERTDONE in the PREPARE response for ``INSERT INTO t1 VALUES (1, 'hello')``, I assumed Informix optimizes literal INSERTs by executing during PREPARE and added a "skip SQ_EXECUTE" branch. Result: SELECT returned 0 rows. The data wasn't actually inserted; the SQ_INSERTDONE in PREPARE was just "here are the serials that WILL be assigned when you execute". After reverting to "always send SQ_EXECUTE", the row persists. Lesson: optimization-looking responses may not be what they look like — always verify with a follow-up SELECT.
---
## 2026-05-04 — SQ_INSERTDONE wire format
**Status**: active
**Decision**: Per IfxSqli.receiveInsertDone (line 2347), the SQ_INSERTDONE payload is 18 bytes for modern (bigint-supported) servers:
- 10 bytes: serial8 inserted (Informix's variable-numeric LONGINT encoding)
- 8 bytes: bigserial inserted (regular 64-bit long, big-endian)
For now we read-and-discard. Phase 5+ will surface these as ``Cursor.lastrowid`` / similar.
---
## 2026-05-04 — Transactions: commit/rollback are 2-byte messages
**Status**: active
**Decision**: ``Connection.commit()`` sends ``[short SQ_CMMTWORK=19][short SQ_EOT=12]`` (4 bytes). ``Connection.rollback()`` sends ``[short SQ_RBWORK=20][short SQ_EOT=12]``. Server responds with SQ_DONE+SQ_EOT (in logged databases) or SQ_ERR sqlcode=-255 ("Not in transaction") in unlogged databases like sysmaster.
**How to apply**: integration tests for transactions need a LOGGED database. The Informix Developer Edition image ships with ``stores_demo`` (logged) — point integration tests at that for commit/rollback verification.
---
## 2026-05-04 — Parameter binding: SQ_BIND chained with SQ_EXECUTE in one PDU
**Status**: active
**Decision**: ``Cursor.execute(sql, params)`` for DML sends one PDU containing SQ_BIND with all parameter values, immediately followed by SQ_EXECUTE. No separate CIDESCRIBE round trip — the server infers parameter types from the type tags we send in SQ_BIND.
**Why this matters**: skipping the CIDESCRIBE/IDESCRIBE handshake (which JDBC does for type-discovery) saves one round trip per execute. The server accepts our SQ_BIND directly because we provide explicit type codes for each parameter.
PDU structure (verified against ``docs/CAPTURES/02-dml-cycle.socat.log`` msg[29]):
```
[short SQ_ID=4][int SQ_BIND=5][short numparams]
for each param:
[short type][short indicator=0 or -1][short prec_or_encLen]
writePadded(rawbytes) # data + 0x00 pad if odd-length
[short SQ_EXECUTE=7]
[short SQ_EOT]
```
Per-type encoding (Phase 4 MVP):
| Python type | IDS type code | Precision short | Data |
|-------------|---------------|-----------------|------|
| ``int`` (32-bit) | 2 (INT) | ``0x0a00`` (=2560 packed display-width=10/scale=0) | 4 bytes BE |
| ``int`` (64-bit) | 52 (BIGINT) | ``0x1300`` (=4864 packed width=19/scale=0) | 8 bytes BE |
| ``str`` | 0 (CHAR — server casts) | 0 | ``[short len][bytes]`` (writePadded adds even pad) |
| ``float`` | 3 (FLOAT/DOUBLE) | 0 | 8 bytes IEEE 754 |
| ``bool`` | 45 (BOOL) | 0 | 1 byte (0x01 or 0x00) |
| ``None`` | 0 | indicator=-1 | (no data) |
**Surprise**: JDBC sends Python-string equivalents as **CHAR (type=0)**, not VARCHAR (type=13). The server handles conversion to the actual column type via internal CIDESCRIBE/IDESCRIBE inference. We do the same — string parameters always go out as CHAR.
**Surprise**: integer precision is **packed** as ``(display_width << 8) | scale``. For INTEGER, that's ``(10 << 8) | 0 = 0x0a00 = 2560``. Initially looked like a bug (why would precision be 2560?) until I realized it's a packed field. Captured in cursor's ``_build_bind_execute_pdu`` and converters' ``_encode_int``.
**Paramstyle**: we declare ``paramstyle = "numeric"`` (PEP 249), supporting ``:1``, ``:2`` placeholders. Internally we rewrite to ``?`` (Informix's native style) before sending PREPARE. Trivial regex; doesn't escape strings/comments — Phase 5 can add a proper SQL tokenizer for that edge case.
---
## 2026-05-04 — SELECT vs DML branching: keyword-based, not nfields-based
**Status**: active
**Decision**: ``Cursor.execute`` branches on the first word of the SQL (``SELECT`` → cursor-fetch path; everything else → execute-and-release path). Don't use ``nfields > 0`` from the DESCRIBE response.
**Why**: a parameterized INSERT (``INSERT INTO t VALUES (?, ?, ?)``) returns a DESCRIBE response with ``nfields > 0`` because the server describes the row that WILL be inserted. The ``nfields == 0`` heuristic that worked for non-parameterized DML breaks here. JDBC does the same via its ``IfxStatement`` / ``IfxPreparedStatement`` subclassing.
---
## 2026-05-04 — Parameterized SELECT works with bind-then-cursor-open
**Status**: active
**Decision**: For parameterized SELECT, send SQ_BIND alone (without SQ_EXECUTE chained) right after PREPARE, then proceed with the regular cursor open + fetch lifecycle (CURNAME+NFETCH+...). The cursor open is what triggers query execution; SQ_BIND just binds the values into the prepared-statement scope.
**Why**: simpler than I expected — server accepts SQ_BIND followed by cursor open in separate PDUs. No need for the IDESCRIBE handshake JDBC does for type discovery.
PDU sequence:
```
1. PREPARE+NDESCRIBE+WANTDONE → DESCRIBE+DONE+COST+EOT
2. SQ_BIND (no EXECUTE) → EOT
3. CURNAME+NFETCH → TUPLE*+DONE+COST+EOT
4. NFETCH (drain) → DONE+COST+EOT
5. CLOSE → EOT
6. RELEASE → EOT
```
Tested with single int param, multiple int params, string param, mixed `:N` style with LIKE patterns. All work correctly.
---
## 2026-05-04 — NULL row encoding: per-type sentinel values
**Status**: active
**Decision**: Each IDS type uses a specific NULL sentinel in tuple data; decoders detect and return Python ``None``.
Sentinels (verified by capture analysis in ``docs/CAPTURES/19-py-null-vs-onechar.socat.log`` and ``20-py-int-null.socat.log``):
| IDS type | NULL sentinel | Distinguishable from valid value? |
|----------|---------------|------------------------------------|
| SMALLINT | ``0x8000`` (= SHORT_MIN) | Yes — SHORT_MIN can't be a regular value |
| INTEGER | ``0x80000000`` (= INT_MIN) | Yes |
| BIGINT | ``0x8000000000000000`` (= LONG_MIN) | Yes |
| REAL | ``ff ff ff ff`` (NaN bit pattern) | Yes (via bytes match, not value match — NaN != NaN) |
| FLOAT/DOUBLE | ``ff ff ff ff ff ff ff ff`` | Yes |
| VARCHAR | ``[byte 1][byte 0]`` (length=1, content=single nul) | Yes — VARCHAR can't contain embedded nuls; the byte-0 within length-1 is the unambiguous null marker |
| DATE | ``0x80000000`` (same as INT) | Yes |
| BOOL | (TBD — Phase 5+) | — |
**The VARCHAR null marker is unusual**: ``[byte 1][byte 0]`` looks like "1-byte string containing 0x00" but Informix's VARCHAR can't have embedded nuls anyway, so it's an unambiguous out-of-band signal. Empty string is encoded as ``[byte 0]`` (length=0, no content) — distinct from NULL.
---
## 2026-05-04 — executemany: PREPARE once, BIND+EXECUTE per row, RELEASE once
**Status**: active
**Decision**: ``Cursor.executemany(sql, seq_of_params)`` does PREPARE once, then loops sending SQ_BIND+SQ_EXECUTE per parameter set, then RELEASE once.
**Performance**: only ~1.06x faster than a loop of ``execute()`` for 200 INSERTs (336ms vs 319ms in our benchmark). Each BIND+EXECUTE round trip dominates; we save only PREPARE+RELEASE per call. **Phase 4.x optimization opportunity**: chain multiple BIND+EXECUTE calls in one PDU (no intermediate flush + read) for true batch performance — would likely give 5-10x speedup. JDBC's "isBatchUpdatePerSpec" path does this; not yet ported.
For now, executemany still gives PEP 249 conformance and slight perf improvement; bulk-insert optimization is a future improvement.
---
## 2026-05-04 — DECIMAL/MONEY decoding: base-100 BCD with asymmetric complement
**Status**: active (decoder); encoder is Phase 6.x
**Decision**: ``_decode_decimal`` handles IDS DECIMAL/MONEY wire bytes per ``com.informix.lang.Decimal.init`` (line 374) format:
```
byte[0] = (sign << 7) | biased_exponent_base100
- bit 7 = sign (1=positive, 0=negative)
- bits 0-6 = (exponent + 64) for positive
- bits 0-6 = (exponent + 64) ^ 0x7F for negative ← XOR'd
byte[1..] = digit-pair bytes (each 0..99 = two BCD digits)
- for negative: asymmetric base-100 complement applied
```
Asymmetric base-100 complement (per ``Decimal.decComplement`` line 447):
- Walk digits RIGHT to LEFT
- Trailing zeros stay zero
- First non-zero digit: subtract from 100
- Subsequent digits: subtract from 99
This was the trickiest decode of the project so far — initial naive
``99 - d`` for all digits gave artifacts like ``-1234.55999`` instead of
``-1234.56``. The trailing-zeros and "first non-zero from 100" rules
are what make the round trip exact.
NULL marker: byte[0] == 0 AND byte[1] == 0.
**Width on the wire**: per-column ``encoded_length`` field is packed as
``(precision << 8) | scale``. Byte width = ``ceil(precision/2) + 1``.
The row decoder uses this to slice DECIMAL columns out of the tuple
payload (``parse_tuple_payload`` in ``_resultset.py``).
**Encoder (``_encode_decimal``)**: implemented but disabled — server
rejects the bytes (precision packing wrong somewhere). Workaround for
Phase 6.x users: cast Decimal to float at the call site or pass via
SQL literal. Decode side is fully working — handles COUNT, SUM, AVG,
literal DECIMAL values, negatives, fractions, NULLs.
---
## 2026-05-04 — Better error messages with PEP 249 exception classification
**Status**: active
**Decision**: ``_raise_sq_err`` decodes the full SQ_ERR payload (sqlcode, isamcode, offset, near-token) and raises the appropriate PEP 249 exception class with a human-readable message and structured fields (``e.sqlcode``, ``e.isamcode``, ``e.offset``, ``e.near``).
PEP 249 classification by sqlcode:
- IntegrityError: -239, -268, -291, -292, -391, -703 (constraint violations)
- ProgrammingError: -201, -206, -217, -286, -310, ... (syntax/object/permission)
- OperationalError: -255, -256, -407, -440, -908, ... (transaction/connection)
- NotSupportedError: -329, -349, -510 (caller-can't-fix)
- DatabaseError: everything else (safe fallback)
Built-in error catalog of ~50 most common Informix sqlcodes in
``src/informix_db/_errcodes.py``. Users extend at runtime via
``register_error_text(code, text)``.
**Connection survives errors**: a failed query doesn't poison the
session — subsequent ``execute()`` calls work normally. Verified by
``test_connection_survives_query_error``.
---
## 2026-05-04 — DATETIME decoding: BCD-packed with qualifier-driven field walk
**Status**: active
**Decision**: ``_decode_datetime(raw, encoded_length)`` walks BCD digit pairs into Python ``datetime`` objects. Returns ``datetime.date`` for date-only qualifiers, ``datetime.time`` for time-only, ``datetime.datetime`` for combined.
Wire format:
- byte[0] = sign + biased exponent (in base-100 digit pairs before decimal)
- byte[1..] = BCD digit pairs (year takes 2 bytes = 4 digits; everything else 1 byte = 2 digits)
The qualifier is packed in the column descriptor's ``encoded_length``:
- high byte = digit_count (total base-10 digits)
- middle nibble = start_TU (time-unit code: YEAR=0, MONTH=2, DAY=4, HOUR=6, MIN=8, SEC=10, FRAC1=11..FRAC5=15)
- low nibble = end_TU
Byte width on the wire = ``ceil(digit_count / 2) + 1``.
Verified against 4 simultaneous DATETIME columns in one tuple:
- YEAR TO SECOND → datetime.datetime(2026, 5, 4, 12, 34, 56)
- YEAR TO DAY → datetime.date(2026, 5, 4)
- HOUR TO SECOND → datetime.time(12, 34, 56)
- YEAR TO FRACTION(3) → datetime.datetime(...)
DATETIME parameter binding (encoder) is Phase 6.x — same status as DECIMAL encoder.
---
## 2026-05-04 — DATE / DATETIME / DECIMAL parameter encoding
**Status**: active
**Decision**: ``encode_param`` dispatches on ``isinstance(value, datetime.datetime / datetime.date / decimal.Decimal)`` to type-specific encoders. Round-trip verified through INSERT + SELECT.
**The 2-byte length-prefix discovery (the unblocker)**: my Phase 6.a DECIMAL encoder and Phase 6.c DATETIME encoder both produced "correct" BCD bytes but the server silently dropped the SQ_BIND PDU. Captured the wire and compared to JDBC — DECIMAL/DATETIME bind data has a **2-byte length prefix** at the start (per ``Decimal.javaToIfx`` line 457) that wraps the BCD payload. With the prefix added (``raw = len(inner).to_bytes(2, "big") + inner``), both encoders work. DATE doesn't need the prefix — it's a fixed 4-byte int.
Per-type encoded format:
| Python | IDS type | Wire bytes |
|--------|----------|------------|
| ``datetime.date`` | DATE (7) | ``[int days_since_1899-12-31]`` (4 bytes BE) |
| ``datetime.datetime`` | DATETIME (10) | ``[short total_len][byte 0xc7][7 BCD pairs]`` (10 bytes total for YEAR TO SECOND) |
| ``decimal.Decimal`` | DECIMAL (5) | ``[short total_len][byte exp][BCD digit pairs]`` (variable) |
For DATETIME, encoder always emits YEAR TO SECOND form (no microseconds). Phase 6.x can add YEAR TO FRACTION(N) variants if microsecond precision is needed.
For DECIMAL, the encoder uses the asymmetric base-100 complement (mirror of decoder) for negatives. Tested with positive, negative, fraction values.
**Lesson**: when a server silently drops a PDU, it's almost always an envelope/framing issue rather than the inner-value bytes being wrong. The 2-byte length prefix here, the SHORT-vs-INT reserved field in CURNAME+NFETCH, the even-byte alignment pad — same pattern.
---
## 2026-05-04 — INTERVAL decoding (both qualifier families)
**Status**: active
**Decision**: ``_decode_interval`` decodes IDS INTERVAL into one of two Python types based on the qualifier's ``start_TU``:
- ``start_TU >= DAY (4)`` (IntervalDF) → ``datetime.timedelta``
- ``start_TU <= MONTH (2)`` (IntervalYM) → :class:`informix_db.IntervalYM` (a small frozen dataclass holding signed total months)
**The wire format is the same as DECIMAL/DATETIME** — ``[head byte][digit pairs in base-100]`` with sign+biased-exponent header. The qualifier short tells you how to *interpret* those digits:
- High byte = total digit count across all fields
- Middle nibble = start_TU; low nibble = end_TU
- First field has variable digit width: ``flen = total_len - (end_TU - start_TU)`` (which is the digits "added" past the first field; each non-first field is exactly 2 digits)
- Subsequent non-first non-fractional fields are 1 byte each (since each is exactly 2 base-10 digits = 1 base-100 digit pair)
- Fractional fields scale to nanoseconds via ``cv *= 10 ** scale_exp`` where ``scale_exp = 18 - end_TU`` forced odd
Wire byte width on the SQ_TUPLE side = ``ceil(digit_count / 2) + 1`` (one head byte + ceil(digits/2) digit pairs). Same formula as DATETIME and DECIMAL — surfaces in ``_resultset.parse_tuple_payload`` as a dedicated branch (because the qualifier is needed at decode time).
**The dec_exp arithmetic that initially fooled me**: I kept misreading ``(total_len + 10 - end_TU + 1) / 2`` as a much larger value than it is. For HOUR(2) TO SECOND, ``total_len=6, end_TU=10``, so dec_exp = 7//2 = 3, not 8. After the encoder writes dec_exp into the head byte and the decoder reads it back, the two match perfectly so the digit array lines up at offset 0 of the 16-byte working buffer — but only if you actually compute the value correctly. *Read your own arithmetic.* (The synthetic unit-test framework caught this immediately, before the integration tests even ran.)
**IntervalYM design**: I considered a NamedTuple with (years, months) fields, but a frozen dataclass with a single signed ``months`` field matches JDBC's ``IntervalYM`` and avoids ambiguity around "what does negative mean for a tuple". ``years`` and ``remainder_months`` are read-only properties; ``__str__`` emits the standard "Y-MM" / "-Y-MM" form. ``slots=True`` makes it as cheap as a NamedTuple memory-wise.
**Verified against 9 integration scenarios** (all decoder branches): DAY TO SECOND, HOUR TO SECOND, MINUTE TO SECOND, YEAR TO MONTH, YEAR-only, negative interval (9's-complement), table column, NULL, and a multi-INTERVAL row (proves per-column slicing works across mixed qualifier families).
INTERVAL parameter binding (encoder) is deferred to Phase 6.e or later — same arc as DECIMAL/DATETIME, where decoding lands first and encoding follows once we have wire captures to compare against.
---
## 2026-05-04 — INTERVAL parameter encoding
**Status**: active
**Decision**: ``encode_param`` dispatches ``datetime.timedelta`` and :class:`IntervalYM` to dedicated encoders that produce the 2-byte-length-prefixed BCD payload (per the Phase 6.c discovery). Default qualifiers are chosen to cover any sane Python value:
- ``timedelta`` → ``INTERVAL DAY(9) TO FRACTION(5)`` (covers ±999,999,999 days × 10us resolution)
- ``IntervalYM`` → ``INTERVAL YEAR(9) TO MONTH`` (covers ±999,999,999 years)
**Why DAY(9) and YEAR(9)?** Python's ``timedelta`` allows up to 999,999,999 days; YEAR/MONTH have no upper bound in Python (just a signed int). We could choose a smaller default, but the wire-format cost is one byte per two extra digits and the user-facing benefit is "no overflow surprises". JDBC's defaults (DAY(2) TO FRACTION(5) for IntervalDF, YEAR(4) TO MONTH for IntervalYM) trade safety for compactness — we make the opposite trade.
**FRACTION(5) is the precision ceiling.** Informix doesn't expose FRAC6 even though the qualifier nibble allows it (per ``Interval.TU_F1..TU_F5``). The encoder scales nanoseconds via ``nans /= 10^(18 - end_TU)`` per JDBC, which means we lose the units digit of microseconds (10us is the smallest representable unit). This is the same limitation JDBC has — Informix fundamentally can't store sub-10us intervals in this format.
**The synthetic round-trip caught every framing bug locally.** Once the decoder works, encoder verification becomes "decode my encoded bytes and compare to the input" — a closed loop with no server in the mix. All 6 integration tests passed on the first run against live Informix; no debugging cycle was needed. This is the dividend from owning both ends of the codec layer.
**Lesson reinforced**: Phase 6.a (DECIMAL encoding) was the real cost — that's where the 2-byte-length-prefix wire-format discovery happened. Phase 6.c (DATE/DATETIME encoding) and Phase 6.e (INTERVAL encoding) each amortized that discovery with one new encoder per qualifier-bearing type. Total wall-clock time per phase is dropping geometrically.
---
## (template — copy below this line for new entries)
```
## YYYY-MM-DD — <one-line decision title>
**Status**: active | superseded | revisited
**Decision**: <chosen path>
**Discarded**: <alternatives, briefly>
**Why**: <rationale>
```