Implements end-to-end round-trip for BYTE (type 11) and TEXT (type 12)
columns. Python bytes/bytearray map to BYTE; str is auto-encoded as
ISO-8859-1 for TEXT.
Wire protocol — write side:
* SQ_BIND payload carries a 56-byte blob descriptor with size at offset
[16..19] (per IfxBlob.toIfx). NULL is byte 39=1.
* After all per-param blocks, SQ_BBIND (41) declares blob count, then
chunked SQ_BLOB (39) messages stream the actual bytes (max 1024
bytes/chunk per JDBC), terminated by zero-length SQ_BLOB.
* Then SQ_EXECUTE proceeds normally.
Wire protocol — read side:
* SQ_TUPLE returns only the 56-byte descriptor; actual bytes live in
the blobspace.
* For each BYTE/TEXT column in each row, send SQ_FETCHBLOB with the
descriptor and read SQ_BLOB chunks until zero-length terminator.
* The locator is only valid while the cursor is open — must dereference
BEFORE sending CLOSE. Doing it after returns -602 (Cannot open blob).
Server-side prerequisites (one-time setup):
1. blobspace: onspaces -c -b blobspace1 -p /path -o 0 -s 50000
2. logged DB: CREATE DATABASE testdb WITH LOG
3. config + archive:
onmode -wm LTAPEDEV=/dev/null
onmode -wm TAPEDEV=/dev/null
onmode -l
ontape -s -L 0 -t /dev/null
Without #3, JDBC fails identically to our driver with "BLOB pages can't
be allocated from a chunk until chunk add is logged". This identical
failure was the diagnostic confirmation that our protocol bytes were
correct — same server response = byte-for-byte parity.
Tests: 9 integration tests in tests/test_blob.py — single-chunk,
multi-chunk (5120 bytes), NULL, multi-row, binary-safe, TEXT roundtrip,
ISO-8859-1, NULL TEXT, mixed columns. Plus the Phase 4
test_unsupported_param_type_raises was updated since bytes is no longer
the canonical unsupported type — switched to a custom class.
Total: 53 unit + 107 integration = 160 tests.
The smart-LOB family (BLOB/CLOB) is a separate state-machine extension
deferred to Phase 9 — it uses IfxLocator + LO_OPEN/LO_READ session
protocol against sbspace, not the BBIND/BLOB stream.
672 lines
47 KiB
Markdown
672 lines
47 KiB
Markdown
# Decision Log
|
||
|
||
Running rationale for protocol, auth, type, and architecture decisions made during the project. New decisions append; old ones are *amended* (with date) rather than overwritten.
|
||
|
||
Format: every decision has a date, a status (`active` / `superseded` / `revisited`), the chosen path, the discarded alternatives, and the *why*.
|
||
|
||
---
|
||
|
||
## 2026-05-02 — Project goal & off-ramp
|
||
|
||
**Status**: active
|
||
**Decision**: Build a pure-Python implementation of the SQLI wire protocol. No IBM Client SDK. No JVM. No native libraries.
|
||
**Off-ramp** (chosen by user during planning): if Phase 0 reveals the protocol is intractable in pure Python — e.g., mandatory undocumented crypto in the handshake — narrow scope (lock to one server version, drop async, drop prepared statements if needed) and stay pure-Python. Do **not** fall back to JPype/JDBC; that defeats the project's purpose.
|
||
**Why**: The "no SDK / no JVM" goal is what makes this driver valuable. A JPype fallback would ship something that works but solves nothing the existing JDBC-via-JPype solution doesn't already solve.
|
||
|
||
---
|
||
|
||
## 2026-05-02 — Package name
|
||
|
||
**Status**: active
|
||
**Decision**: `informix-db`
|
||
**Discarded**: `informixdb-pure` (longer), `ifxsqli` (less discoverable), `pyifx` (obscure)
|
||
**PyPI availability**: confirmed available 2026-05-02 (HTTP 404 on `/pypi/informix-db/json`). The legacy `informixdb` is taken (HTTP 200), `informix` is also free (404) but too generic.
|
||
**Why**: Discoverability balanced with brevity. Anyone searching PyPI for "informix" finds it; the hyphen distinguishes it from the legacy C-extension wrapper.
|
||
|
||
---
|
||
|
||
## 2026-05-02 — License
|
||
|
||
**Status**: active
|
||
**Decision**: MIT
|
||
**Discarded**: Apache-2.0 (more defensive but less common in Python ecosystem), BSD-3-Clause
|
||
**Why**: Simplest, most permissive, ecosystem-standard for Python libraries.
|
||
|
||
---
|
||
|
||
## 2026-05-02 — Sync first; async deferred
|
||
|
||
**Status**: active
|
||
**Decision**: Build a sync, blocking-socket implementation. Async lands in Phase 6+ as a separate `informix_db.aio` subpackage following asyncpg's I/O-agnostic-protocol pattern.
|
||
**Why**: Wire protocols are hard enough; debugging protocol bugs through asyncio plumbing is two layers of indirection too many. Sync-first means we can test against blocking sockets, prove correctness, then mechanically swap the I/O layer.
|
||
|
||
---
|
||
|
||
## 2026-05-02 — Test target
|
||
|
||
**Status**: active
|
||
**Decision**: `icr.io/informix/informix-developer-database` (the Developer Edition image, now maintained by HCL Software since the 2017 IBM→HCL transfer of Informix), port 9088 (native SQLI).
|
||
**Pinned digest** (captured 2026-05-02 from `docker pull`):
|
||
`sha256:8202d69ba5674df4b13140d5121dd11b7b26b28dc60119b7e8f87e533e538ba1`
|
||
**On-disk footprint**: 2.23 GB unpacked / 665 MB compressed.
|
||
**Default credentials** (from container startup logs, accept-license run):
|
||
- OS/DB user: `informix`
|
||
- Password: `in4mix`
|
||
- HQ admin password: `Passw0rd` (don't need this)
|
||
- DBA user/password: empty
|
||
- DBSERVERNAME: defaults to `informix` (same as the user)
|
||
- TLS_CONNECTIONS: OFF (plain auth on port 9088)
|
||
- Always-present databases: `sysmaster`, `sysuser` (built during init)
|
||
**Container startup**: `docker run -d --name ifx --privileged -p 9088:9088 -e LICENSE=accept -e SIZE=small icr.io/informix/informix-developer-database@sha256:8202d69b...`
|
||
**Why**: Free, official, no license click-through, supports plain-password auth out of the box. The digest is locked from Phase 0 onward — `:latest` is the canonical source of flaky integration suites in DB-driver projects, so all `docker-compose.yml` files reference the digest, never the tag.
|
||
|
||
---
|
||
|
||
## 2026-05-02 — Phase 0 is a gate, not a step
|
||
|
||
**Status**: active
|
||
**Decision**: No library code is written until `PROTOCOL_NOTES.md` meets all four exit criteria:
|
||
1. Login byte layout documented end-to-end
|
||
2. Message-type tags identified for login/execute/row/end-of-result/error/disconnect
|
||
3. `SELECT 1` round-trip fully labeled
|
||
4. JDBC source and packet capture corroborate on login + execute paths
|
||
|
||
If exit criteria can't be met within bounded effort, invoke the off-ramp.
|
||
|
||
**Why**: Most greenfield projects fail by writing code before they understand the problem. This project has an undocumented wire protocol as its central unknown. Gating on Phase 0 means a failed spike still produces a publicly valuable artifact (`PROTOCOL_NOTES.md`) instead of a half-built driver.
|
||
|
||
---
|
||
|
||
## 2026-05-02 — Phase 1 architecture decisions (locked at start of Phase 1)
|
||
|
||
> These are pre-decided so paramstyle/Python-floor/autocommit don't churn later. Recorded here so Phase 1 doesn't relitigate them.
|
||
|
||
- **`paramstyle = "numeric"`** (`:1`, `:2`, …). Matches Informix ESQL/C convention.
|
||
- **Python ≥ 3.10**. Gives us `match`, modern type hints, `tomllib`.
|
||
- **`autocommit` defaults to off**. PEP 249 implicit semantics; opt-in via `connect(autocommit=True)`.
|
||
- **Author**: Ryan Malloy `<ryan@supported.systems>` (per global pyproject.toml convention).
|
||
- **Versioning**: CalVer `YYYY.MM.DD` (`2026.05.02` initial); same-day fixes use PEP 440 post-release `2026.05.02.1`, `.2`, etc.
|
||
|
||
---
|
||
|
||
## 2026-05-02 — DATE pulled forward to MVP
|
||
|
||
**Status**: active
|
||
**Decision**: DATE is included in the Phase 2 MVP type set, alongside SMALLINT/INTEGER/BIGINT/FLOAT/CHAR/VARCHAR/BOOLEAN.
|
||
**Discarded**: leaving DATE in the "medium" / Phase 6 bucket.
|
||
**Why**: Almost no real Informix database is DATE-free. The encoding is trivial once the type code is known (4-byte day count from the Informix epoch 1899-12-31). Cheap to include; expensive to leave out.
|
||
|
||
DATETIME / INTERVAL / DECIMAL / NUMERIC / MONEY remain in Phase 6+ — their encodings (qualifier-byte precision, BCD-style packed decimal) are non-trivial.
|
||
|
||
---
|
||
|
||
## 2026-05-02 — `CLAUDE.md` excluded from git and sdist
|
||
|
||
**Status**: active
|
||
**Decision**: `.gitignore` excludes `CLAUDE.md`. Once `pyproject.toml` exists, `[tool.hatch.build.targets.sdist].exclude` will also list `CLAUDE.md`.
|
||
**Why**: `CLAUDE.md` contains the user's email and operator-private context. Per global convention, only commit `CLAUDE.md` to private repos. This project is destined for PyPI / public Git.
|
||
|
||
---
|
||
|
||
## 2026-05-02 — JDBC reference: `ifxjdbc.jar` 4.50.JC10
|
||
|
||
**Status**: active
|
||
**Decision**: Use the user-provided `ifxjdbc.jar` from `/home/rpm/bingham/rtmt/lib/` as the JDBC reference, working copy at `build/ifxjdbc.jar`.
|
||
**JAR identity**: `Implementation-Version: 4.50.10-SNAPSHOT`, build 146, dated 2023-03-07. Printable version string: `4.50.JC10`. SHA256 `dc5622cb4e95678d15836b684b6ef1783d37bc0cdd2725208577fc300df4e5f1`.
|
||
**Discarded**: Maven Central `com.ibm.informix:jdbc:4.50.4.1` (not downloaded — the local copy is newer).
|
||
**Why**: A newer reference is strictly better — the wire protocol is backwards-compatible, so anything `4.50.JC10` knows how to send/receive will be accepted by older servers. Avoids the Maven download.
|
||
|
||
---
|
||
|
||
## 2026-05-02 — Decompiler: CFR 0.152
|
||
|
||
**Status**: active
|
||
**Decision**: Use CFR 0.152 (https://github.com/leibnitz27/cfr) as the JDBC decompiler. Cached at `build/tools/cfr.jar`.
|
||
**Discarded**: Procyon, Fernflower, Ghidra (Ghidra MCP port pool was exhausted; CFR alone proved sufficient).
|
||
**Why**: CFR produces the most readable Java for modern bytecode, ships as a single fat JAR, has no install step. Decompiles 478 .java files in seconds.
|
||
|
||
---
|
||
|
||
## 2026-05-02 — Confirmed: CSM is dead in modern Informix
|
||
|
||
**Status**: active
|
||
**Decision**: Do NOT plan for CSM (Communications Support Module) support. Ever.
|
||
**Evidence**: `com.informix.asf.Connection.getOptProperties()` (decompiled) literally throws: `"CSM Encryption is no longer supported"` if `SECURITY` or `CSM` opt-prop is set.
|
||
**Why**: This used to be the supplied-encryption-plugin layer. IBM removed it; modern Informix uses TLS/SSL exclusively. Removes CSM from every phase plan.
|
||
|
||
---
|
||
|
||
## 2026-05-02 — Wire framing primitives confirmed (from JDBC)
|
||
|
||
**Status**: active (pending PCAP corroboration)
|
||
**Decision**: Adopt these wire-framing primitives in `_protocol.py` from day one:
|
||
- All multi-byte integers are **big-endian** (network byte order)
|
||
- SmallInt = 2 bytes, Int = 4 bytes, BigInt = 8 bytes, Real = 4 bytes IEEE 754, Double = 8 bytes IEEE 754
|
||
- Variable-length payloads (string, decimal, datetime, interval, BLOB): `[short length][bytes][optional 0x00 pad if length is odd]` — **the 16-bit alignment requirement is mandatory; missing it desynchronizes the parser**
|
||
- Strings emitted as `[short len+1][bytes][0x00 nul terminator]` (the +1 is the trailing nul)
|
||
- Post-login messages have NO header: each is `[short messageType][payload]` and the next message begins immediately after the previous one's payload ends
|
||
- Login PDU has its own SLheader (6 bytes) + PFheader structure
|
||
**Source**: `com.informix.lang.JavaToIfxType` (encoders), `com.informix.asf.IfxDataInputStream`/`IfxDataOutputStream` (framing), `com.informix.asf.Connection` (login PDU). Documented byte-by-byte in `PROTOCOL_NOTES.md`.
|
||
|
||
---
|
||
|
||
## 2026-05-02 — Plain-password auth: no challenge-response round trip
|
||
|
||
**Status**: active
|
||
**Decision**: For MVP, treat plain-password auth as a single round trip: client sends one binary login PDU containing the password inline; server replies with one PDU containing version + capabilities or an error block.
|
||
**Why**: `Connection.encodeAscBinary()` writes the password as a length-prefixed string within the login PDU body. There is no separate auth phase, no salt, no hashing, no `SQ_CHALLENGE`/`SQ_RESPONSE` exchange. Those constants (129/130) are reserved for PAM and other interactive auth methods, used AFTER the binary login PDU when the server initiates them.
|
||
|
||
---
|
||
|
||
## 2026-05-02 — Capability ints: corrected after PDU diff caught misread
|
||
|
||
**Status**: active (corrects an earlier same-day entry)
|
||
**Decision**: Send `Cap_1 = 0x0000013c, Cap_2 = 0, Cap_3 = 0` in the binary login PDU. These are the values IBM's JDBC driver sends; the server echoes them back identically.
|
||
**Why this is a correction**: An earlier read of the wire bytes (before we wrote the byte-for-byte PDU diff) decoded the capability section as `Cap_1=1, Cap_2=0x3c000000, Cap_3=0`. That was a misalignment — the `0x3c` byte interpreted as `Cap_2`'s high byte was actually `Cap_1`'s low byte. Real layout: a single int `0x0000013c` = `(capability_class << 8) | PF_PROT_SQLI_0600 (60 = 0x3c)`.
|
||
**How we caught it**: `tests/test_pdu_match.py` — captures our generated PDU via a monkey-patched socket and asserts byte-for-byte equality against `docs/CAPTURES/01-connect-only.socat.log` for offsets 2..280 (the structural prefix). The connection still worked with the wrong values because the dev image is permissive, but the PDU was structurally non-identical. **Server-accepts ≠ structurally-correct.**
|
||
**Methodology takeaway**: For wire-protocol implementations, always diff against the reference vendor's PDU bytes, not just "it connected." Permissive servers mask real bugs.
|
||
|
||
---
|
||
|
||
## 2026-05-04 — VARCHAR row decoding: three byte-level discoveries
|
||
|
||
**Status**: active
|
||
**Decision**: ``parse_tuple_payload`` now handles VARCHAR/NCHAR/NVCHAR with a single-byte length prefix; SQ_TUPLE payloads are padded to even byte alignment; the trailing reserved field in CURNAME+NFETCH is a SHORT not an INT.
|
||
**Why this is three findings**: each one was caught by a different debugging technique:
|
||
|
||
1. **CURNAME+NFETCH PDU off by 2 bytes**: my reserved trailing field was `write_int(0)` (4 bytes); JDBC's reference is `write_short(0)` (2 bytes). Caught by capturing both PDUs under socat and byte-diffing — our 44-byte vs JDBC's 42-byte. The server happened to accept the longer version for INT-only SELECTs (silently treating the extra zeros as padding) but rejected it for VARCHAR queries. Lesson: **server tolerance varies by query type — always match JDBC byte-for-byte**.
|
||
|
||
2. **SQ_TUPLE payload pads to even alignment**: when `size` is odd, an extra 0x00 byte follows the payload before the next tag. Found in `docs/CAPTURES/15-py-varchar-fixed.socat.log` — an 11-byte "syscolumns" VARCHAR payload had a trailing `0x00` that JDBC's `IfxRowColumn.readTuple` consumes silently. We weren't doing this, so the parser desynced for any odd-length variable-width row. **Even-byte alignment is a wire-protocol-wide invariant — every variable-length payload pads.**
|
||
|
||
3. **VARCHAR in tuple uses 1-byte length prefix, NOT 2**: per the on-wire encoding (verified empirically in capture 15), VARCHAR values in row data are `[byte length][bytes]` — single-byte prefix, max 255 chars. NCHAR and NVCHAR follow the same pattern. (CHAR is fixed-width per encoded_length, no length prefix at all.) LVARCHAR uses a 4-byte int prefix for values >255 bytes.
|
||
|
||
**How to apply**: when adding new variable-width type decoders, capture a tuple under socat first to see the exact framing — don't infer from the column descriptor's `encoded_length`, which is the MAX storage, not the wire format. The wire format may differ by orders of magnitude (1-byte prefix vs encoded_length=128 for VARCHAR).
|
||
|
||
---
|
||
|
||
## 2026-05-04 — DML / DDL execution path: SQ_PREPARE + SQ_EXECUTE + SQ_RELEASE
|
||
|
||
**Status**: active
|
||
**Decision**: For statements that don't return rows (CREATE, INSERT, UPDATE, DELETE, DROP), Cursor.execute branches on ``nfields == 0`` in the DESCRIBE response. SELECT path is the cursor lifecycle (CURNAME+NFETCH+...); DDL/DML path is just SQ_EXECUTE then SQ_RELEASE.
|
||
**Why**: JDBC uses SQ_PREPARE for everything; for non-SELECT it just doesn't open a cursor. Per IfxSqli.sendExecute (line 1075): non-prepared-statement execute is a bare ``[short SQ_ID=4][int SQ_EXECUTE=7][short SQ_EOT]`` (8 bytes).
|
||
|
||
---
|
||
|
||
## 2026-05-04 — SQ_INSERTDONE (=94) is execution metadata, NOT execution
|
||
|
||
**Status**: active
|
||
**Decision**: SQ_INSERTDONE arrives in BOTH the DESCRIBE response (PREPARE phase) AND the EXECUTE response for literal-value INSERTs. It carries the auto-generated serial values that WILL be / WERE inserted. Don't interpret SQ_INSERTDONE in the DESCRIBE response as "row was inserted" — it's just metadata. Always send SQ_EXECUTE.
|
||
**Why this was a debugging trap**: when I first saw SQ_INSERTDONE in the PREPARE response for ``INSERT INTO t1 VALUES (1, 'hello')``, I assumed Informix optimizes literal INSERTs by executing during PREPARE and added a "skip SQ_EXECUTE" branch. Result: SELECT returned 0 rows. The data wasn't actually inserted; the SQ_INSERTDONE in PREPARE was just "here are the serials that WILL be assigned when you execute". After reverting to "always send SQ_EXECUTE", the row persists. Lesson: optimization-looking responses may not be what they look like — always verify with a follow-up SELECT.
|
||
|
||
---
|
||
|
||
## 2026-05-04 — SQ_INSERTDONE wire format
|
||
|
||
**Status**: active
|
||
**Decision**: Per IfxSqli.receiveInsertDone (line 2347), the SQ_INSERTDONE payload is 18 bytes for modern (bigint-supported) servers:
|
||
- 10 bytes: serial8 inserted (Informix's variable-numeric LONGINT encoding)
|
||
- 8 bytes: bigserial inserted (regular 64-bit long, big-endian)
|
||
|
||
For now we read-and-discard. Phase 5+ will surface these as ``Cursor.lastrowid`` / similar.
|
||
|
||
---
|
||
|
||
## 2026-05-04 — Transactions: commit/rollback are 2-byte messages
|
||
|
||
**Status**: active
|
||
**Decision**: ``Connection.commit()`` sends ``[short SQ_CMMTWORK=19][short SQ_EOT=12]`` (4 bytes). ``Connection.rollback()`` sends ``[short SQ_RBWORK=20][short SQ_EOT=12]``. Server responds with SQ_DONE+SQ_EOT (in logged databases) or SQ_ERR sqlcode=-255 ("Not in transaction") in unlogged databases like sysmaster.
|
||
**How to apply**: integration tests for transactions need a LOGGED database. The Informix Developer Edition image ships with ``stores_demo`` (logged) — point integration tests at that for commit/rollback verification.
|
||
|
||
---
|
||
|
||
## 2026-05-04 — Parameter binding: SQ_BIND chained with SQ_EXECUTE in one PDU
|
||
|
||
**Status**: active
|
||
**Decision**: ``Cursor.execute(sql, params)`` for DML sends one PDU containing SQ_BIND with all parameter values, immediately followed by SQ_EXECUTE. No separate CIDESCRIBE round trip — the server infers parameter types from the type tags we send in SQ_BIND.
|
||
**Why this matters**: skipping the CIDESCRIBE/IDESCRIBE handshake (which JDBC does for type-discovery) saves one round trip per execute. The server accepts our SQ_BIND directly because we provide explicit type codes for each parameter.
|
||
|
||
PDU structure (verified against ``docs/CAPTURES/02-dml-cycle.socat.log`` msg[29]):
|
||
```
|
||
[short SQ_ID=4][int SQ_BIND=5][short numparams]
|
||
for each param:
|
||
[short type][short indicator=0 or -1][short prec_or_encLen]
|
||
writePadded(rawbytes) # data + 0x00 pad if odd-length
|
||
[short SQ_EXECUTE=7]
|
||
[short SQ_EOT]
|
||
```
|
||
|
||
Per-type encoding (Phase 4 MVP):
|
||
|
||
| Python type | IDS type code | Precision short | Data |
|
||
|-------------|---------------|-----------------|------|
|
||
| ``int`` (32-bit) | 2 (INT) | ``0x0a00`` (=2560 packed display-width=10/scale=0) | 4 bytes BE |
|
||
| ``int`` (64-bit) | 52 (BIGINT) | ``0x1300`` (=4864 packed width=19/scale=0) | 8 bytes BE |
|
||
| ``str`` | 0 (CHAR — server casts) | 0 | ``[short len][bytes]`` (writePadded adds even pad) |
|
||
| ``float`` | 3 (FLOAT/DOUBLE) | 0 | 8 bytes IEEE 754 |
|
||
| ``bool`` | 45 (BOOL) | 0 | 1 byte (0x01 or 0x00) |
|
||
| ``None`` | 0 | indicator=-1 | (no data) |
|
||
|
||
**Surprise**: JDBC sends Python-string equivalents as **CHAR (type=0)**, not VARCHAR (type=13). The server handles conversion to the actual column type via internal CIDESCRIBE/IDESCRIBE inference. We do the same — string parameters always go out as CHAR.
|
||
|
||
**Surprise**: integer precision is **packed** as ``(display_width << 8) | scale``. For INTEGER, that's ``(10 << 8) | 0 = 0x0a00 = 2560``. Initially looked like a bug (why would precision be 2560?) until I realized it's a packed field. Captured in cursor's ``_build_bind_execute_pdu`` and converters' ``_encode_int``.
|
||
|
||
**Paramstyle**: we declare ``paramstyle = "numeric"`` (PEP 249), supporting ``:1``, ``:2`` placeholders. Internally we rewrite to ``?`` (Informix's native style) before sending PREPARE. Trivial regex; doesn't escape strings/comments — Phase 5 can add a proper SQL tokenizer for that edge case.
|
||
|
||
---
|
||
|
||
## 2026-05-04 — SELECT vs DML branching: keyword-based, not nfields-based
|
||
|
||
**Status**: active
|
||
**Decision**: ``Cursor.execute`` branches on the first word of the SQL (``SELECT`` → cursor-fetch path; everything else → execute-and-release path). Don't use ``nfields > 0`` from the DESCRIBE response.
|
||
**Why**: a parameterized INSERT (``INSERT INTO t VALUES (?, ?, ?)``) returns a DESCRIBE response with ``nfields > 0`` because the server describes the row that WILL be inserted. The ``nfields == 0`` heuristic that worked for non-parameterized DML breaks here. JDBC does the same via its ``IfxStatement`` / ``IfxPreparedStatement`` subclassing.
|
||
|
||
---
|
||
|
||
## 2026-05-04 — Parameterized SELECT works with bind-then-cursor-open
|
||
|
||
**Status**: active
|
||
**Decision**: For parameterized SELECT, send SQ_BIND alone (without SQ_EXECUTE chained) right after PREPARE, then proceed with the regular cursor open + fetch lifecycle (CURNAME+NFETCH+...). The cursor open is what triggers query execution; SQ_BIND just binds the values into the prepared-statement scope.
|
||
**Why**: simpler than I expected — server accepts SQ_BIND followed by cursor open in separate PDUs. No need for the IDESCRIBE handshake JDBC does for type discovery.
|
||
|
||
PDU sequence:
|
||
```
|
||
1. PREPARE+NDESCRIBE+WANTDONE → DESCRIBE+DONE+COST+EOT
|
||
2. SQ_BIND (no EXECUTE) → EOT
|
||
3. CURNAME+NFETCH → TUPLE*+DONE+COST+EOT
|
||
4. NFETCH (drain) → DONE+COST+EOT
|
||
5. CLOSE → EOT
|
||
6. RELEASE → EOT
|
||
```
|
||
|
||
Tested with single int param, multiple int params, string param, mixed `:N` style with LIKE patterns. All work correctly.
|
||
|
||
---
|
||
|
||
## 2026-05-04 — NULL row encoding: per-type sentinel values
|
||
|
||
**Status**: active
|
||
**Decision**: Each IDS type uses a specific NULL sentinel in tuple data; decoders detect and return Python ``None``.
|
||
|
||
Sentinels (verified by capture analysis in ``docs/CAPTURES/19-py-null-vs-onechar.socat.log`` and ``20-py-int-null.socat.log``):
|
||
|
||
| IDS type | NULL sentinel | Distinguishable from valid value? |
|
||
|----------|---------------|------------------------------------|
|
||
| SMALLINT | ``0x8000`` (= SHORT_MIN) | Yes — SHORT_MIN can't be a regular value |
|
||
| INTEGER | ``0x80000000`` (= INT_MIN) | Yes |
|
||
| BIGINT | ``0x8000000000000000`` (= LONG_MIN) | Yes |
|
||
| REAL | ``ff ff ff ff`` (NaN bit pattern) | Yes (via bytes match, not value match — NaN != NaN) |
|
||
| FLOAT/DOUBLE | ``ff ff ff ff ff ff ff ff`` | Yes |
|
||
| VARCHAR | ``[byte 1][byte 0]`` (length=1, content=single nul) | Yes — VARCHAR can't contain embedded nuls; the byte-0 within length-1 is the unambiguous null marker |
|
||
| DATE | ``0x80000000`` (same as INT) | Yes |
|
||
| BOOL | (TBD — Phase 5+) | — |
|
||
|
||
**The VARCHAR null marker is unusual**: ``[byte 1][byte 0]`` looks like "1-byte string containing 0x00" but Informix's VARCHAR can't have embedded nuls anyway, so it's an unambiguous out-of-band signal. Empty string is encoded as ``[byte 0]`` (length=0, no content) — distinct from NULL.
|
||
|
||
---
|
||
|
||
## 2026-05-04 — executemany: PREPARE once, BIND+EXECUTE per row, RELEASE once
|
||
|
||
**Status**: active
|
||
**Decision**: ``Cursor.executemany(sql, seq_of_params)`` does PREPARE once, then loops sending SQ_BIND+SQ_EXECUTE per parameter set, then RELEASE once.
|
||
|
||
**Performance**: only ~1.06x faster than a loop of ``execute()`` for 200 INSERTs (336ms vs 319ms in our benchmark). Each BIND+EXECUTE round trip dominates; we save only PREPARE+RELEASE per call. **Phase 4.x optimization opportunity**: chain multiple BIND+EXECUTE calls in one PDU (no intermediate flush + read) for true batch performance — would likely give 5-10x speedup. JDBC's "isBatchUpdatePerSpec" path does this; not yet ported.
|
||
|
||
For now, executemany still gives PEP 249 conformance and slight perf improvement; bulk-insert optimization is a future improvement.
|
||
|
||
---
|
||
|
||
## 2026-05-04 — DECIMAL/MONEY decoding: base-100 BCD with asymmetric complement
|
||
|
||
**Status**: active (decoder); encoder is Phase 6.x
|
||
**Decision**: ``_decode_decimal`` handles IDS DECIMAL/MONEY wire bytes per ``com.informix.lang.Decimal.init`` (line 374) format:
|
||
|
||
```
|
||
byte[0] = (sign << 7) | biased_exponent_base100
|
||
- bit 7 = sign (1=positive, 0=negative)
|
||
- bits 0-6 = (exponent + 64) for positive
|
||
- bits 0-6 = (exponent + 64) ^ 0x7F for negative ← XOR'd
|
||
byte[1..] = digit-pair bytes (each 0..99 = two BCD digits)
|
||
- for negative: asymmetric base-100 complement applied
|
||
```
|
||
|
||
Asymmetric base-100 complement (per ``Decimal.decComplement`` line 447):
|
||
- Walk digits RIGHT to LEFT
|
||
- Trailing zeros stay zero
|
||
- First non-zero digit: subtract from 100
|
||
- Subsequent digits: subtract from 99
|
||
|
||
This was the trickiest decode of the project so far — initial naive
|
||
``99 - d`` for all digits gave artifacts like ``-1234.55999`` instead of
|
||
``-1234.56``. The trailing-zeros and "first non-zero from 100" rules
|
||
are what make the round trip exact.
|
||
|
||
NULL marker: byte[0] == 0 AND byte[1] == 0.
|
||
|
||
**Width on the wire**: per-column ``encoded_length`` field is packed as
|
||
``(precision << 8) | scale``. Byte width = ``ceil(precision/2) + 1``.
|
||
The row decoder uses this to slice DECIMAL columns out of the tuple
|
||
payload (``parse_tuple_payload`` in ``_resultset.py``).
|
||
|
||
**Encoder (``_encode_decimal``)**: implemented but disabled — server
|
||
rejects the bytes (precision packing wrong somewhere). Workaround for
|
||
Phase 6.x users: cast Decimal to float at the call site or pass via
|
||
SQL literal. Decode side is fully working — handles COUNT, SUM, AVG,
|
||
literal DECIMAL values, negatives, fractions, NULLs.
|
||
|
||
---
|
||
|
||
## 2026-05-04 — Better error messages with PEP 249 exception classification
|
||
|
||
**Status**: active
|
||
**Decision**: ``_raise_sq_err`` decodes the full SQ_ERR payload (sqlcode, isamcode, offset, near-token) and raises the appropriate PEP 249 exception class with a human-readable message and structured fields (``e.sqlcode``, ``e.isamcode``, ``e.offset``, ``e.near``).
|
||
|
||
PEP 249 classification by sqlcode:
|
||
- IntegrityError: -239, -268, -291, -292, -391, -703 (constraint violations)
|
||
- ProgrammingError: -201, -206, -217, -286, -310, ... (syntax/object/permission)
|
||
- OperationalError: -255, -256, -407, -440, -908, ... (transaction/connection)
|
||
- NotSupportedError: -329, -349, -510 (caller-can't-fix)
|
||
- DatabaseError: everything else (safe fallback)
|
||
|
||
Built-in error catalog of ~50 most common Informix sqlcodes in
|
||
``src/informix_db/_errcodes.py``. Users extend at runtime via
|
||
``register_error_text(code, text)``.
|
||
|
||
**Connection survives errors**: a failed query doesn't poison the
|
||
session — subsequent ``execute()`` calls work normally. Verified by
|
||
``test_connection_survives_query_error``.
|
||
|
||
---
|
||
|
||
## 2026-05-04 — DATETIME decoding: BCD-packed with qualifier-driven field walk
|
||
|
||
**Status**: active
|
||
**Decision**: ``_decode_datetime(raw, encoded_length)`` walks BCD digit pairs into Python ``datetime`` objects. Returns ``datetime.date`` for date-only qualifiers, ``datetime.time`` for time-only, ``datetime.datetime`` for combined.
|
||
|
||
Wire format:
|
||
- byte[0] = sign + biased exponent (in base-100 digit pairs before decimal)
|
||
- byte[1..] = BCD digit pairs (year takes 2 bytes = 4 digits; everything else 1 byte = 2 digits)
|
||
|
||
The qualifier is packed in the column descriptor's ``encoded_length``:
|
||
- high byte = digit_count (total base-10 digits)
|
||
- middle nibble = start_TU (time-unit code: YEAR=0, MONTH=2, DAY=4, HOUR=6, MIN=8, SEC=10, FRAC1=11..FRAC5=15)
|
||
- low nibble = end_TU
|
||
|
||
Byte width on the wire = ``ceil(digit_count / 2) + 1``.
|
||
|
||
Verified against 4 simultaneous DATETIME columns in one tuple:
|
||
- YEAR TO SECOND → datetime.datetime(2026, 5, 4, 12, 34, 56)
|
||
- YEAR TO DAY → datetime.date(2026, 5, 4)
|
||
- HOUR TO SECOND → datetime.time(12, 34, 56)
|
||
- YEAR TO FRACTION(3) → datetime.datetime(...)
|
||
|
||
DATETIME parameter binding (encoder) is Phase 6.x — same status as DECIMAL encoder.
|
||
|
||
---
|
||
|
||
## 2026-05-04 — DATE / DATETIME / DECIMAL parameter encoding
|
||
|
||
**Status**: active
|
||
**Decision**: ``encode_param`` dispatches on ``isinstance(value, datetime.datetime / datetime.date / decimal.Decimal)`` to type-specific encoders. Round-trip verified through INSERT + SELECT.
|
||
|
||
**The 2-byte length-prefix discovery (the unblocker)**: my Phase 6.a DECIMAL encoder and Phase 6.c DATETIME encoder both produced "correct" BCD bytes but the server silently dropped the SQ_BIND PDU. Captured the wire and compared to JDBC — DECIMAL/DATETIME bind data has a **2-byte length prefix** at the start (per ``Decimal.javaToIfx`` line 457) that wraps the BCD payload. With the prefix added (``raw = len(inner).to_bytes(2, "big") + inner``), both encoders work. DATE doesn't need the prefix — it's a fixed 4-byte int.
|
||
|
||
Per-type encoded format:
|
||
|
||
| Python | IDS type | Wire bytes |
|
||
|--------|----------|------------|
|
||
| ``datetime.date`` | DATE (7) | ``[int days_since_1899-12-31]`` (4 bytes BE) |
|
||
| ``datetime.datetime`` | DATETIME (10) | ``[short total_len][byte 0xc7][7 BCD pairs]`` (10 bytes total for YEAR TO SECOND) |
|
||
| ``decimal.Decimal`` | DECIMAL (5) | ``[short total_len][byte exp][BCD digit pairs]`` (variable) |
|
||
|
||
For DATETIME, encoder always emits YEAR TO SECOND form (no microseconds). Phase 6.x can add YEAR TO FRACTION(N) variants if microsecond precision is needed.
|
||
|
||
For DECIMAL, the encoder uses the asymmetric base-100 complement (mirror of decoder) for negatives. Tested with positive, negative, fraction values.
|
||
|
||
**Lesson**: when a server silently drops a PDU, it's almost always an envelope/framing issue rather than the inner-value bytes being wrong. The 2-byte length prefix here, the SHORT-vs-INT reserved field in CURNAME+NFETCH, the even-byte alignment pad — same pattern.
|
||
|
||
---
|
||
|
||
## 2026-05-04 — INTERVAL decoding (both qualifier families)
|
||
|
||
**Status**: active
|
||
**Decision**: ``_decode_interval`` decodes IDS INTERVAL into one of two Python types based on the qualifier's ``start_TU``:
|
||
- ``start_TU >= DAY (4)`` (IntervalDF) → ``datetime.timedelta``
|
||
- ``start_TU <= MONTH (2)`` (IntervalYM) → :class:`informix_db.IntervalYM` (a small frozen dataclass holding signed total months)
|
||
|
||
**The wire format is the same as DECIMAL/DATETIME** — ``[head byte][digit pairs in base-100]`` with sign+biased-exponent header. The qualifier short tells you how to *interpret* those digits:
|
||
- High byte = total digit count across all fields
|
||
- Middle nibble = start_TU; low nibble = end_TU
|
||
- First field has variable digit width: ``flen = total_len - (end_TU - start_TU)`` (which is the digits "added" past the first field; each non-first field is exactly 2 digits)
|
||
- Subsequent non-first non-fractional fields are 1 byte each (since each is exactly 2 base-10 digits = 1 base-100 digit pair)
|
||
- Fractional fields scale to nanoseconds via ``cv *= 10 ** scale_exp`` where ``scale_exp = 18 - end_TU`` forced odd
|
||
|
||
Wire byte width on the SQ_TUPLE side = ``ceil(digit_count / 2) + 1`` (one head byte + ceil(digits/2) digit pairs). Same formula as DATETIME and DECIMAL — surfaces in ``_resultset.parse_tuple_payload`` as a dedicated branch (because the qualifier is needed at decode time).
|
||
|
||
**The dec_exp arithmetic that initially fooled me**: I kept misreading ``(total_len + 10 - end_TU + 1) / 2`` as a much larger value than it is. For HOUR(2) TO SECOND, ``total_len=6, end_TU=10``, so dec_exp = 7//2 = 3, not 8. After the encoder writes dec_exp into the head byte and the decoder reads it back, the two match perfectly so the digit array lines up at offset 0 of the 16-byte working buffer — but only if you actually compute the value correctly. *Read your own arithmetic.* (The synthetic unit-test framework caught this immediately, before the integration tests even ran.)
|
||
|
||
**IntervalYM design**: I considered a NamedTuple with (years, months) fields, but a frozen dataclass with a single signed ``months`` field matches JDBC's ``IntervalYM`` and avoids ambiguity around "what does negative mean for a tuple". ``years`` and ``remainder_months`` are read-only properties; ``__str__`` emits the standard "Y-MM" / "-Y-MM" form. ``slots=True`` makes it as cheap as a NamedTuple memory-wise.
|
||
|
||
**Verified against 9 integration scenarios** (all decoder branches): DAY TO SECOND, HOUR TO SECOND, MINUTE TO SECOND, YEAR TO MONTH, YEAR-only, negative interval (9's-complement), table column, NULL, and a multi-INTERVAL row (proves per-column slicing works across mixed qualifier families).
|
||
|
||
INTERVAL parameter binding (encoder) is deferred to Phase 6.e or later — same arc as DECIMAL/DATETIME, where decoding lands first and encoding follows once we have wire captures to compare against.
|
||
|
||
---
|
||
|
||
## 2026-05-04 — INTERVAL parameter encoding
|
||
|
||
**Status**: active
|
||
**Decision**: ``encode_param`` dispatches ``datetime.timedelta`` and :class:`IntervalYM` to dedicated encoders that produce the 2-byte-length-prefixed BCD payload (per the Phase 6.c discovery). Default qualifiers are chosen to cover any sane Python value:
|
||
- ``timedelta`` → ``INTERVAL DAY(9) TO FRACTION(5)`` (covers ±999,999,999 days × 10us resolution)
|
||
- ``IntervalYM`` → ``INTERVAL YEAR(9) TO MONTH`` (covers ±999,999,999 years)
|
||
|
||
**Why DAY(9) and YEAR(9)?** Python's ``timedelta`` allows up to 999,999,999 days; YEAR/MONTH have no upper bound in Python (just a signed int). We could choose a smaller default, but the wire-format cost is one byte per two extra digits and the user-facing benefit is "no overflow surprises". JDBC's defaults (DAY(2) TO FRACTION(5) for IntervalDF, YEAR(4) TO MONTH for IntervalYM) trade safety for compactness — we make the opposite trade.
|
||
|
||
**FRACTION(5) is the precision ceiling.** Informix doesn't expose FRAC6 even though the qualifier nibble allows it (per ``Interval.TU_F1..TU_F5``). The encoder scales nanoseconds via ``nans /= 10^(18 - end_TU)`` per JDBC, which means we lose the units digit of microseconds (10us is the smallest representable unit). This is the same limitation JDBC has — Informix fundamentally can't store sub-10us intervals in this format.
|
||
|
||
**The synthetic round-trip caught every framing bug locally.** Once the decoder works, encoder verification becomes "decode my encoded bytes and compare to the input" — a closed loop with no server in the mix. All 6 integration tests passed on the first run against live Informix; no debugging cycle was needed. This is the dividend from owning both ends of the codec layer.
|
||
|
||
**Lesson reinforced**: Phase 6.a (DECIMAL encoding) was the real cost — that's where the 2-byte-length-prefix wire-format discovery happened. Phase 6.c (DATE/DATETIME encoding) and Phase 6.e (INTERVAL encoding) each amortized that discovery with one new encoder per qualifier-bearing type. Total wall-clock time per phase is dropping geometrically.
|
||
|
||
---
|
||
|
||
## 2026-05-04 — Phase 6.f research: BYTE / TEXT / BLOB / CLOB protocol scope
|
||
|
||
**Status**: research complete; implementation deferred
|
||
**Decision**: Decoupling LOB types into their own phase. The four "LOB" types split into two protocol families with materially different wire-level cost:
|
||
|
||
### Protocol family A: BYTE (type=11) and TEXT (type=12) — legacy in-row-pointed blobs
|
||
|
||
**Server-side requirements** (verified empirically against the IBM dev container 15.0.1.0.3DE):
|
||
- A blobspace must exist (`onspaces -c -b blobspace1 -p ... -o 0 -s 50000`)
|
||
- The database must be logged (`CREATE DATABASE testdb WITH LOG`)
|
||
- The column declaration must place data in the blobspace: `data BYTE IN blobspace1`
|
||
|
||
**Even with all that, BYTE/TEXT cannot be inserted via SQL literals.** I verified by running `dbaccess - test_byte.sql` with `INSERT INTO t VALUES (1, "0x68656c6c6f")` and getting:
|
||
|
||
```
|
||
617: A blob data type must be supplied within this context.
|
||
```
|
||
|
||
This is a hard server-side restriction: blob data **must** arrive via the binary BBIND wire path. There is no string-literal escape hatch.
|
||
|
||
**Wire protocol** (per `IfxSqli.sendBind` line 844, `sendBlob` line 3328, `sendStreamBlob` line 3482):
|
||
|
||
1. **SQ_BIND** (tag 5): per-param block declares the BYTE/TEXT slot but the inline data is a **56-byte blob descriptor** (per `IfxBlob.toIfx` line 162) — mostly zeros, with the size at offset [16:20] as a 4-byte big-endian int. Byte 39 is the null indicator (1 = null).
|
||
2. **SQ_BBIND** (tag 41): `[short tag=41][short blob_count]` — the count of BYTE/TEXT params being streamed.
|
||
3. **For each BYTE/TEXT param**: stream of `SQ_BLOB` (tag 39) chunks: `[short tag=39][short length][padded data]`. Chunks max out at 1024 bytes per `sendStreamBlob`.
|
||
4. **End-of-blob marker**: a final `SQ_BLOB` with `[short tag=39][short length=0]`.
|
||
5. Then SQ_EXECUTE proceeds normally.
|
||
|
||
**Decoder side**: rows containing BYTE/TEXT have a 56-byte descriptor in the SQ_TUPLE payload (per `IfxRowColumn.loadColumnData` switch case for type 11/12 reading 56 bytes). Then a separate stream of SQ_BLOB tags arrives **between** SQ_TUPLE messages, carrying the actual bytes.
|
||
|
||
**Estimated implementation cost**: substantial. Cursor state machine needs to:
|
||
- Detect `bytes`/`str`-meant-as-TEXT params and route them through SQ_BBIND after SQ_BIND
|
||
- Send the 56-byte descriptor as the inline placeholder
|
||
- Stream chunks ≤1024 bytes each
|
||
- On the read path, parse SQ_BLOB tags between SQ_TUPLE messages and reassemble per-column
|
||
|
||
This is a multi-day effort and warrants its own phase, **Phase 7+**.
|
||
|
||
### Protocol family B: BLOB (type=102) and CLOB (type=101) — smart-LOBs with locators
|
||
|
||
**Server-side requirements**: an sbspace (smart-LOB space), more complex than blobspace. (Verified: `onspaces -c -S sbspace1 ...`).
|
||
|
||
**Wire protocol**: even more involved than BYTE/TEXT. Per `IfxLobInputStream` and `IfxSmartBlob`, smart-LOB access uses an LO_OPEN/LO_READ/LO_WRITE/LO_CLOSE session protocol against the sbspace, with handles called *locators* that travel inline in the SQ_TUPLE while the actual bytes go over a separate channel. JDBC's `IfxLocator` is a 56-byte descriptor (same shape as the BYTE descriptor!) but carries semantic meaning: storage type, sbspace ID, partition number, etc.
|
||
|
||
**Estimated implementation cost**: substantial++ — significantly larger than BYTE/TEXT, because we'd need to implement the LO_* RPC sub-protocol entirely.
|
||
|
||
### Decision
|
||
|
||
**Phase 6.f is closed as research-complete** with this entry as the deliverable. The findings replace assumptions (e.g., "BLOB/CLOB will be similar to INTERVAL") with actual protocol facts. Implementation is split into:
|
||
- **Phase 8** (future): BYTE/TEXT bind+read with the SQ_BBIND/SQ_BLOB wire machinery
|
||
- **Phase 9** (future): smart-LOB BLOB/CLOB with the LO_OPEN/LO_READ session protocol
|
||
|
||
In the meantime, **users who need to insert binary data** can use the existing `LVARCHAR` path via `str` (works for binary if encoded with `iso-8859-1`) up to ~32K — which is the LVARCHAR on-wire limit. Not a substitute for true BYTE/TEXT but covers many practical cases.
|
||
|
||
The constants `SQ_BBIND=41`, `SQ_BLOB=39`, `SQ_FETCHBLOB=38`, `SQ_SBBIND=52`, `SQ_FILE_READ=106`, `SQ_FILE_WRITE=107` are already declared in `_messages.py` from earlier scaffolding — the protocol layer is ready when implementation lands.
|
||
|
||
**Honest scope-discovery moment**: I went into Phase 6.f assuming it'd be similar effort to INTERVAL. Reading the wire protocol revealed a different shape entirely — multi-PDU sequences require state-machine surgery, not just new codecs. Pivoting now (instead of half-implementing) is the right call.
|
||
|
||
---
|
||
|
||
## 2026-05-04 — Phase 7: real transaction semantics on logged databases
|
||
|
||
**Status**: active
|
||
**Decision**: The driver now manages transactions implicitly on logged databases. Three protocol facts came out of integration testing that materially shaped the implementation:
|
||
|
||
### Fact 1: SQ_BEGIN is REQUIRED before the first DML in a logged-DB transaction
|
||
|
||
Informix in non-ANSI mode does NOT auto-open a server-side transaction on the first DML. Without an explicit ``SQ_BEGIN`` (tag 35), the server treats each statement as if it's already in some implicit txn (data is visible after the INSERT) but ``COMMIT WORK`` afterward fails with sqlcode -255 ("Not in transaction"). The "INSERT then COMMIT" sequence appears to work for visibility but the COMMIT-as-no-op is broken in a way that violates user expectations.
|
||
|
||
**Solution**: ``Connection._ensure_transaction()`` is called by ``Cursor.execute()`` and ``Cursor.executemany()`` before sending PREPARE. It sends ``SQ_BEGIN`` if no transaction is currently open. Idempotent within an open txn. After ``commit()``/``rollback()``, ``_in_transaction`` is reset to ``False`` so the NEXT DML triggers a fresh ``SQ_BEGIN``.
|
||
|
||
For unlogged databases, ``SQ_BEGIN`` returns sqlcode -201 ("BEGIN WORK requires logged DB"). We **cache that result** on the connection (``_supports_begin_work=False``) so subsequent DML doesn't re-probe. This means the same client code works seamlessly on logged or unlogged DBs without the user having to know which they're hitting.
|
||
|
||
### Fact 2: SQ_RBWORK has a savepoint short payload — SQ_CMMTWORK does not
|
||
|
||
Reading ``IfxSqli.sendRollback`` (line 647) revealed that ``SQ_RBWORK`` (tag 20) is followed by ``[short savepoint=0]`` BEFORE the ``SQ_EOT`` framing tag. Without that 2-byte payload, the server **silently hangs** waiting for it — no error, no timeout, just a stuck socket read.
|
||
|
||
This caused a confusing 30-second test timeout on the first integration run. The fix is one line:
|
||
|
||
```python
|
||
self._sock.write_all(struct.pack("!hhh", SQ_RBWORK, 0, SQ_EOT))
|
||
```
|
||
|
||
``SQ_CMMTWORK`` (tag 19), by contrast, has no payload — it's just the tag followed by SQ_EOT.
|
||
|
||
**Lesson**: same pattern as the SHORT-vs-INT field in CURNAME+NFETCH (Phase 4.x) and the 2-byte length prefix in DECIMAL/DATETIME/INTERVAL bind data (Phase 6.c+). When the server hangs, **it's almost always an incomplete PDU body** — the server is waiting for bytes you didn't send. Compare your bytes to JDBC's, byte-by-byte.
|
||
|
||
### Fact 3: SQ_XACTSTAT (tag 99) is a logged-DB-only message
|
||
|
||
Logged databases emit ``SQ_XACTSTAT`` (tag 99) interleaved with normal DML responses to inform the client of transaction-state events. Body: ``[short xcEvent][short xcNewLevel][short xcOldLevel]``. We don't surface these events to the user (yet) but must drain them in **every** response-reading path: ``_drain_to_eot`` (used by commit, rollback, DML), ``_read_describe_response`` (PREPARE response), ``_read_fetch_response`` (NFETCH response), and the connection-level ``_drain_to_eot`` (used by SQ_BEGIN, session init).
|
||
|
||
Without handling SQ_XACTSTAT in all four paths, the cursor desynchronizes from the wire stream and the next read pulls garbage tags (which then raise "unexpected tag" errors that hide the real cause).
|
||
|
||
### Cross-connection isolation tests are config-dependent — don't bake them in
|
||
|
||
The original test plan included a cross-connection visibility test ("conn A inserts, conn B reads zero rows before commit, then sees one row after"). Informix's default isolation is **Committed Read with row-level locking**, so conn B's SELECT *blocks* on the unlocked row rather than returning zero. With ``LOCK MODE NOT WAIT`` (the default), this surfaces as sqlcode -252 (lock timeout) immediately. With ``LOCK MODE WAIT N``, it waits N seconds.
|
||
|
||
Either behavior is correct under Informix semantics — the test would just be testing the lock manager, not transaction visibility. We removed that test and replaced it with the simpler ``test_committed_data_visible_to_fresh_connection`` which proves durability across connections without engaging the lock manager.
|
||
|
||
### Test coverage delivered
|
||
|
||
10 transaction tests in ``tests/test_transactions.py``, all passing against the auto-created ``testdb`` logged database:
|
||
|
||
- Commit visibility (single connection)
|
||
- Rollback isolation — the "Phase 3 gate" test
|
||
- Multi-row rollback
|
||
- Partial-commit-then-rollback
|
||
- Autocommit semantics (persists, rollback no-op)
|
||
- Cross-connection durability
|
||
- UPDATE+rollback, DELETE+rollback
|
||
- Implicit per-statement transaction
|
||
|
||
The ``conftest.py::_ensure_testdb`` fixture auto-creates ``testdb WITH LOG`` if missing, so the tests work on a fresh dev container provided ``blobspace1`` and ``sbspace1`` exist (created during Phase 6.f research).
|
||
|
||
### Two old tests retired
|
||
|
||
``test_commit_rollback_in_unlogged_db_raises`` and ``test_commit_in_unlogged_db_is_operational_error`` were written assuming commit() on an unlogged DB raised -255. The Phase 7 driver-side smarts now make those calls a silent no-op (the connection knows there's no open txn). Both tests were rewritten to assert the new (better) behavior. PEP 249 doesn't mandate any specific behavior for unsupported operations; "graceful no-op" matches what most modern drivers do.
|
||
|
||
---
|
||
|
||
## 2026-05-04 — Phase 8: BYTE / TEXT bind+read (the SQ_BBIND/SQ_BLOB protocol)
|
||
|
||
**Status**: active
|
||
**Decision**: BYTE (type 11) and TEXT (type 12) round-trip end-to-end. Python `bytes`/`bytearray` map to BYTE; `str` is auto-encoded as ISO-8859-1 for TEXT (matching the server's default codeset). NULL is byte 39 of the descriptor.
|
||
|
||
### Wire protocol — write side
|
||
|
||
A BYTE/TEXT param uses **two** PDU sections within the same SQ_BIND envelope:
|
||
|
||
1. **Inline placeholder** (per `IfxBlob.toIfx` line 162): a 56-byte blob descriptor with **only** the size at offset [16..19] as a 4-byte big-endian int. All other bytes are zero. (For NULL, byte 39 is set to 1.)
|
||
2. **SQ_BBIND stream** (per `IfxSqli.sendBlob` line 3328): after all per-param SQ_BIND blocks, emit `[short SQ_BBIND=41][short blob_count]`, then for each blob param stream chunked SQ_BLOB messages: `[short SQ_BLOB=39][short chunk_len][padded data]` (max 1024 bytes/chunk per JDBC's `sendStreamBlob`), ending with a zero-length terminator `[short SQ_BLOB=39][short 0]`.
|
||
|
||
Then SQ_EXECUTE proceeds normally.
|
||
|
||
### Wire protocol — read side
|
||
|
||
The SQ_TUPLE payload returns only the 56-byte descriptor for BYTE/TEXT columns — the actual bytes live in the blobspace. The client must explicitly fetch via SQ_FETCHBLOB (per `IfxSqli.sendFetchBlob` line 3716):
|
||
|
||
```
|
||
[short SQ_ID=4][int 38=SQ_FETCHBLOB][padded 56-byte descriptor][short SQ_EOT]
|
||
```
|
||
|
||
The server replies with one or more SQ_BLOB chunks ending with a zero-length terminator. The descriptor's locator is **only valid while the cursor is open** — the dereferencing must happen between the final NFETCH and CLOSE. Doing it after CLOSE returns -602 (Cannot open blob) with ISAM -101.
|
||
|
||
### Server-side prerequisites
|
||
|
||
The IBM dev container needs three things, in this order, before BYTE/TEXT works at all:
|
||
1. **A blobspace**: `onspaces -c -b blobspace1 -p /path -o 0 -s 50000`
|
||
2. **A logged database**: `CREATE DATABASE testdb WITH LOG` (BYTE/TEXT rejected in unlogged DBs with sqlcode -617)
|
||
3. **Config + level-0 archive to allow chunk page allocation**:
|
||
```bash
|
||
onmode -wm LTAPEDEV=/dev/null
|
||
onmode -wm TAPEDEV=/dev/null
|
||
onmode -l # advance logical log
|
||
ontape -s -L 0 -t /dev/null # level-0 archive
|
||
```
|
||
Without the archive, JDBC fails identically to our driver with "Cannot close blob — BLOB pages can't be allocated from a chunk until chunk add is logged" (ISAM -169). **This was the unblocker that confirmed our protocol implementation was correct** — when JDBC and our driver fail identically against the same broken server config, you've got byte-for-byte protocol parity. Then fix the server.
|
||
|
||
### Architectural note: rest-of-the-codec-types-vs-this-one
|
||
|
||
Phase 6.a/c/e (DECIMAL/DATETIME/INTERVAL) shipped fast because each type was a single-PDU codec — encode bytes, send inline. BYTE/TEXT required **state-machine surgery**:
|
||
- The bind builder now knows about "blob-aware" params and queues them for a separate stream after the per-param block.
|
||
- The cursor's SELECT lifecycle now does a SQ_FETCHBLOB round-trip per blob column per row before sending CLOSE.
|
||
- The dereferencing is a separate read loop that handles its own SQ_DONE/SQ_COST/SQ_XACTSTAT interleaving.
|
||
|
||
The smart-LOB family (BLOB type 102, CLOB type 101) is a **further** state-machine extension — they use `IfxLocator` references against sbspace and require an LO_OPEN/LO_READ/LO_WRITE/LO_CLOSE session protocol entirely separate from BBIND/BLOB. That's deferred to Phase 9.
|
||
|
||
### Test coverage delivered
|
||
|
||
9 integration tests in `tests/test_blob.py`:
|
||
- `test_byte_roundtrip_short` — single-chunk payload
|
||
- `test_byte_roundtrip_multichunk` — 5120 bytes (5 chunks at 1024 each)
|
||
- `test_byte_null` — null descriptor (byte 39=1) → Python None
|
||
- `test_byte_multi_row` — three rows, each with its own SQ_FETCHBLOB
|
||
- `test_byte_binary_safe` — preserves null bytes, high bytes, etc.
|
||
- `test_text_roundtrip` — TEXT column, str returned (decoded)
|
||
- `test_text_with_unicode_iso8859` — extended-Latin chars round-trip
|
||
- `test_text_null`
|
||
- `test_byte_alongside_other_types` — BYTE column mixed with INT
|
||
|
||
Plus the Phase 4 `test_unsupported_param_type_raises` was updated — `bytes` is no longer the canonical "unsupported" sentinel, since we now support it. Switched to a custom Python class for that role.
|
||
|
||
### The "JDBC fails identically" debugging discovery
|
||
|
||
When the first round of integration tests failed with sqlcode -603, I built a Java `byte-cycle` scenario in `tests/reference/RefClient.java` that uses `PreparedStatement.setBytes()` against the same server. JDBC failed with the **exact same error** ("Cannot close blob — chunk add is logged"). That was the diagnostic moment: our protocol bytes were correct; the server config was wrong. After the level-0 archive, both JDBC and our driver succeeded.
|
||
|
||
This is the third instance of "compare against JDBC at the byte level" diagnostic pattern paying off (after the SHORT-vs-INT bug from Phase 4.x and the 2-byte length prefix from Phase 6.c). Worth promoting to a debugging recipe: **when our driver fails and you suspect protocol error, replicate the operation through `RefClient`. Same error = server/config issue. Different error = our bug.**
|
||
|
||
---
|
||
|
||
## (template — copy below this line for new entries)
|
||
|
||
```
|
||
## YYYY-MM-DD — <one-line decision title>
|
||
|
||
**Status**: active | superseded | revisited
|
||
**Decision**: <chosen path>
|
||
**Discarded**: <alternatives, briefly>
|
||
**Why**: <rationale>
|
||
```
|