informix-db/CHANGELOG.md
Ryan Malloy 7f729b3a38 Phase 37: Pre-baked per-column reader strategy (2026.05.05.10)
Closes some of the C-vs-Python codec gap on bulk fetch by moving
per-column dispatch decisions from row time to parse_describe time.
Same approach psycopg3 uses in its pure-Python mode (loader cache
per column).

What changed:

_resultset.py:
* New compile_column_readers(columns) builds a per-column dispatch
  tuple at parse_describe time. Each tuple is (kind, *args) where
  kind is a small int (FIXED/BYTE_PREFIX/CHAR/LVARCHAR/DECIMAL/
  DATETIME/INTERVAL/LEGACY).
* parse_tuple_payload accepts optional readers= parameter. Fast
  path uses int comparison + tuple unpack instead of the legacy
  frozenset/dict-lookup chain.
* _legacy_dispatch_one_column factored out to handle rare types
  (UDT/composite/UDTVAR) that fall through.

cursors.py:
* Cursor caches self._column_readers after parse_describe,
  computed once via compile_column_readers. Reset on new execute.
* Fetch loop passes readers=self._column_readers.

Performance (median of 10+ rounds):

  select_scaling[1000]:    2.7 ms -> 2.51 ms (-7%)
  select_scaling[10000]:  25.8 ms -> 25.0 ms (-3%)
  select_scaling[100000]: 271 ms  -> 246 ms (-9%)
  wide_row_select[5]:     2.4 ms  -> 2.16 ms (-10%)
  wide_row_select[20]:    5.1 ms  -> 4.14 ms (-19%)
  wide_row_select[50]:    10.1 ms -> 8.21 ms (-19%)
  wide_row_select[100]:   19.4 ms -> 14.6 ms (-25%)

Wide-row workloads benefit most - per-column dispatch savings
accumulate linearly with column count. At 100 cols, 25% speedup.

IfxPy gap shrinks from ~2.4x to ~2.2x on bulk fetch. Real progress
but not closing-the-gap. Next lever is exec()-based codegen
(per-result-set decoder function) - possible Phase 38.

221 integration tests still pass. Benchmark suite acts as regression
test.

Architectural note: chose tuple dispatch (r[0] int compare) over
object-method dispatch (loader.load(data)) for ~20-30 ns/col speed
advantage in the inner loop. Slightly less extensible than psycopg3's
class-based loaders but materially faster in pure Python.
2026-05-05 13:50:40 -06:00

64 KiB
Raw Blame History

Changelog

All notable changes to informix-db. Versioning is CalVerYYYY.MM.DD for date-based releases, YYYY.MM.DD.N for same-day post-releases per PEP 440.

2026.05.05.10 — Phase 37: Pre-baked per-column reader strategy

Closes some of the C-vs-Python codec gap on bulk fetch by moving per-column dispatch decisions from row time to parse_describe time. Same idea as psycopg3's pure-Python loader-cache pattern.

What changed

src/informix_db/_resultset.py:

  • New compile_column_readers(columns) returns a list of pre-computed dispatch tuples — one per column. Each tuple is (kind, *args) where kind is a small int identifying the reader strategy.
  • parse_tuple_payload accepts an optional readers= parameter. When provided, the hot loop dispatches on the integer kind (one int comparison per column) instead of running the legacy frozenset/dict-lookup chain.
  • Common types (FIXED, BYTE_PREFIX, CHAR, LVARCHAR, DECIMAL, DATETIME, INTERVAL) get pre-compiled fast paths. Rare types (UDT/composite) tagged _RK_LEGACY and fall through to a _legacy_dispatch_one_column helper.

src/informix_db/cursors.py:

  • Cursor now stores self._column_readers after parse_describe, computed once via compile_column_readers. Reset on each new execute.
  • The fetch loop passes readers=self._column_readers to parse_tuple_payload.

Performance

Real numbers from the integration container, median of 10+ rounds:

Benchmark Before After Δ
select_scaling[1000] 2.7 ms 2.51 ms -7%
select_scaling[10000] 25.8 ms 25.0 ms -3%
select_scaling[100000] 271 ms 246 ms -9%
wide_row_select[5] 2.4 ms 2.16 ms -10%
wide_row_select[20] 5.1 ms 4.14 ms -19%
wide_row_select[50] 10.1 ms 8.21 ms -19%
wide_row_select[100] 19.4 ms 14.6 ms -25%

Wide-row workloads benefit most — per-column dispatch savings accumulate linearly with column count. At 100 columns the speedup is 25%; at 5 columns it's 10%.

Honest assessment

Less than the ~30% I projected. The actual per-row cost is dominated by decoder bodies and slice operations more than I estimated; pre-baking the dispatch only saved ~50-100 ns/col instead of the 150-200 ns I'd hoped for.

The IfxPy gap shrinks from ~2.4× to ~2.2× on bulk fetch. Real progress, but not closing-the-gap territory. The next lever for materially closing the gap is exec()-based codegen (build a row-decoder function per result-set shape; eliminates per-column iteration overhead entirely). Possible Phase 38.

Architectural note

This is the same pattern psycopg3 uses in its pure-Python mode: cache loaders per column at execute time, dispatch via lookup in the hot loop. We pick tuple-dispatch over object-method dispatch (r[0] int compare vs. loader.load(data)) for raw speed in the inner loop — slightly less extensible but ~20-30 ns faster per column.

Tests

All 221 integration tests still pass. No new test code; the benchmark suite acts as the regression test (parse_tuple_5cols / select_scaling / wide_row_select).

2026.05.05.9 — IfxPy scaling comparison + honest comparison numbers (Phase 36)

Adds the IfxPy side of Phase 34's scaling benchmarks (1k / 10k / 100k rows for both executemany and SELECT) and updates the README's comparison table with the actually-correct numbers.

What changed

1. tests/benchmarks/compare/ifxpy_bench.py extended with bench_executemany_scaling(n) and bench_select_scaling(n) — same shapes as test_scaling_perf.py so the comparison is apples-to-apples.

2. README's comparison numbers corrected. Earlier comparison runs reported select_bench_table_all at 891 µs for informix-db. Re-running with consistent measurement (warmup + median + 10+ rounds) reports 3.04 ms — a 3.4× discrepancy. The earlier number was probably picked up from a noisy first-run with a different warmup state, or from a benchmark that wasn't fully populating its fixture. Either way, the "we win all 5 benchmarks" claim was based on inconsistent measurement.

The corrected comparison reveals two clear stories:

Benchmark IfxPy informix-db Result
executemany(1k) in txn 23.5 ms 23.2 ms tied
executemany(10k) in txn 259 ms 161 ms us 1.6× faster
executemany(100k) in txn 2376 ms 1487 ms us 1.6× faster
SELECT 1k rows 1.2 ms 2.7 ms IfxPy 2.3× faster
SELECT 10k rows 11.3 ms 25.8 ms IfxPy 2.3× faster
SELECT 100k rows 112 ms 271 ms IfxPy 2.4× faster

Bulk-insert: pure-Python wins 1.6× at scale because pipelining (Phase 33) eliminates per-row RTT. IfxPy's IfxPy.execute(stmt, tuple) per-call API can't pipeline.

Large-fetch: IfxPy wins 2.3-2.4× at scale. Their C-level fetch_tuple decoder runs at ~1.1 µs/row; our parse_tuple_payload runs at ~2.7 µs/row. This is the real C-vs-Python codec cost showing up at scale where it matters.

Why correcting this matters

A "we win everything" claim that's based on noisy measurements would have collapsed the first time a user ran their own benchmark and got different numbers. Naming the trade-off honestly — "we're faster at bulk write, slower at bulk read, comparable elsewhere" — is the right framing.

When to prefer informix-db

  • ETL pipelines, log shipping, bulk writes (1.6× faster at scale)
  • Containerized / minimal-dependency environments (50 KB wheel vs IfxPy's 92 MB OneDB tarball + libcrypt.so.1 dependency hell)
  • Modern Python (works on 3.103.14; IfxPy is broken on Python 3.12+)
  • Async / FastAPI workloads (we have native async; IfxPy doesn't)

When IfxPy may be faster

  • Analytical reporting queries pulling 10k+ rows in a single SELECT
  • Workloads where the per-row decode cost dominates (wide rows, tight read loops)

The actionable takeaway for informix-db's future: the parse_tuple_payload hot path is now the bottleneck at scale. Phase 25's branch reorder shaved 22%; further work (Cython codec? deeper inlining? per-column dispatch pre-bake?) could close the C-vs-Python gap. Tracked as a possible Phase 37+.

2026.05.05.8 — Scaling benchmarks (Phase 34)

Adds tests/benchmarks/test_scaling_perf.py — parametrized benchmarks that exercise the driver at row counts and column widths well beyond what the existing 1k-row benchmarks cover. The first thing this suite did was catch the NFETCH-loop data-loss bug fixed in Phase 35.

What the new suite measures

Bulk insert scaling (test_executemany_scaling[1000|10000|100000]):

  • 1k rows: 23 ms (23 µs/row)
  • 10k rows: 161 ms (16 µs/row)
  • 100k rows: 1487 ms (15 µs/row)

Per-row cost decreases with scale — pipelining (Phase 33) amortizes the prepare/release overhead better at larger N. 15 µs/row at 100k means ~67,000 rows/sec sustained on a single connection.

SELECT scaling (test_select_scaling[1000|10000|100000]):

  • 1k rows: 2.7 ms (2.7 µs/row)
  • 10k rows: 25.8 ms (2.6 µs/row)
  • 100k rows: 271 ms (2.7 µs/row)

Perfectly linear. parse_tuple_payload's optimization work (Phases 23-25) holds up at 100× scale with no per-row degradation — proves the codec scales well, no GC-pause amplification, no memory pressure.

Wide-row scaling (test_wide_row_select[5|20|50]):

  • 5 cols × 1000 rows: 2.4 ms
  • 20 cols × 1000 rows: 5.1 ms
  • 50 cols × 1000 rows: 10.1 ms

Per-column cost actually decreases with width (better amortization of fixed loop overhead).

Type-mix SELECT (test_select_type_mix_1000_rows):

  • 6-column row mixing INT + VARCHAR + DECIMAL + DATE + FLOAT + SMALLINT: 4.7 ms (4.7 µs/row).

About 1.7× slower than the INT-heavy 5-col baseline. The DECIMAL BCD parser and DATE epoch math contribute the bulk of the extra time. Still well under 5 µs/row for realistic application workloads.

What the suite caught immediately

Running test_select_scaling[100000] returned 200 rows instead of 100,000 — exposing the NFETCH-loop bug fixed in 2026.05.05.7 (Phase 35). The bug had been latent since the cursor was first written; the scaling suite is now the regression test that prevents regressions.

Headline takeaway

informix-db exhibits linear-scaling, near-constant-per-row cost across both fetch and bulk-insert workloads at the 100k-row scale. Per-row codec cost is 2.7 µs unchanged from 1k → 100k rows. Per-row insert cost actually drops from 23 µs (1k) to 15 µs (100k) thanks to pipelining.

Tests

10 new parametrized benchmarks. Total: 77 unit + 249 integration + 43 benchmark = 369 tests.

2026.05.05.7 — CRITICAL: Fix NFETCH loop for large result sets (Phase 35)

This is a data-loss bug fix. Anyone running cursor.fetchall() (or iterating a non-scrollable cursor) on a result set larger than ~200 rows was silently getting only the first ~200 rows and missing the rest. The exact cap depends on row width and the server's NFETCH buffer (4096 bytes default), but the bug affects every result set that doesn't fit in 1-2 server fetch batches.

The bug

Cursor._execute_select sent NFETCH twice and stopped:

self._conn._send_pdu(self._build_curname_nfetch_pdu(cursor_name))
self._read_fetch_response()
# Drain — fetch again to confirm no more rows.
# (JDBC always does this; the second fetch returns DONE only.)
self._conn._send_pdu(self._build_nfetch_pdu())
self._read_fetch_response()

The "second fetch returns DONE only" comment was wrong — for any result set larger than the server's per-NFETCH batch, the second fetch returns more tuples and there are still tuples queued server-side. After the second fetch, the cursor closed and the rest of the rows were discarded.

This bug has been latent for ~30 phases because every existing test used either a small result set (e.g., systables FIRST 10) or relied on row-counts that fit naturally in 1-2 batches. The scaling benchmark (Phase 34) was the first time we tried SELECT FIRST 100000 and got back 200 rows.

The fix

_execute_select now loops NFETCH until a response yields zero new tuples:

self._conn._send_pdu(self._build_curname_nfetch_pdu(cursor_name))
rows_before = len(self._rows)
self._read_fetch_response()
rows_received = len(self._rows) - rows_before

while rows_received > 0:
    self._conn._send_pdu(self._build_nfetch_pdu())
    rows_before = len(self._rows)
    self._read_fetch_response()
    rows_received = len(self._rows) - rows_before

Tests

All 249 existing integration tests still pass. The scaling benchmark suite (Phase 34) is the regression test that would have caught this earlier — SELECT FIRST 100000 from a 100k-row table now returns the expected 100,000 rows.

Impact

  • Severity: CRITICAL (silent data loss).
  • Workaround prior to this fix: use scrollable cursors (conn.cursor(scrollable=True)) which use the SQ_SFETCH protocol path and don't have this bug.
  • Affected versions: every release before 2026.05.05.7.

If you've been using this driver for queries that returned large result sets, you may have been getting truncated results without knowing it. Re-run those queries against 2026.05.05.7+ to verify your data.

2026.05.05.6 — Pipelined executemany (Phase 33) — 2.85× faster on bulk inserts

The previous serial-loop executemany paid one wire round-trip per row (~30 µs/row on loopback × N rows = the dominant cost for any sizeable batch). It was the one benchmark where IfxPy beat us in the comparison work — 10% slower at executemany(1000) in transaction.

Phase 33 pipelines the BIND+EXECUTE PDUs: build all N PDUs first, send them back-to-back, then drain all N responses. Eliminates the per-row RTT entirely.

Performance impact

Benchmark Before After Speedup
executemany(1000) in transaction 31.3 ms 11.0 ms 2.85× faster
executemany(100) in autocommit 173 ms 154 ms 11% faster
executemany(1000) in autocommit 1740 ms 1590 ms 9% faster

Autocommit cases get smaller relative wins because server-side log flushes per row dominate the absolute cost (Phase 21.1's "autocommit cliff").

IfxPy comparison: now winning all 5 benchmarks

The comparison flipped from "us 10% slower on bulk inserts" to "us 2.05× faster":

Benchmark IfxPy informix-db Result
select_one_row 118 µs 114 µs us 3% faster
select_systables_first_10 164 µs 159 µs us 3% faster
select_bench_table_all (1k rows) 984 µs 891 µs us 9% faster
executemany(1000) in txn 21.4 ms 10.4 ms us 2.05× faster
cold_connect_disconnect 11.0 ms 10.4 ms us 5% faster

Margaret Hamilton review pass

Hamilton flagged one critical concern (C1) before approving: the pipeline assumes Informix sends exactly N responses for N pipelined PDUs even when one row fails. If the server cut the response stream short on first error, the drain loop would block on the next read and the connection would deadlock.

Verified by integration test (tests/test_executemany_pipeline.py):

  • Constraint violation at row 0/100 (first-row failure)
  • Constraint violation at row 99/100 (last-row failure)
  • Constraint violation at row 500/1000 (mid-batch failure)

All 3 confirm: Informix DOES send N responses for N PDUs; wire stays aligned; connection is usable after.

Plus four lower-priority fixes Hamilton recommended:

  • H1: documented the _raise_sq_err self-drains-SQ_EOT invariant in the drain loop, plus the tripwire test that catches its violation.
  • H2: docstring warning that lock-holding time scales O(N) in batch size; recommend chunking for very large batches.
  • M1: prepend row-index annotation rather than reformat the exception message — preserves [<sqlcode>] <text> prefix for string-scraping callers.
  • M2: documented that sendall doesn't honor a write timeout reliably on all kernels; recommend keepalive=True for hostile networks.

Tests

3 new integration tests in tests/test_executemany_pipeline.py validate the wire-alignment invariant. Total: 77 unit + 239 integration + 33 benchmark = 349 tests.

Note on version 2026.05.05.5

The Phase 32 (Tier 1+2 benchmarks) tag was applied without bumping pyproject.toml's version string — that release is git-tag-only. Version 2026.05.05.6 (Phase 33) is the next published version increment.

2026.05.05.4 — Final hardening pass (Phase 30)

Closes the last 3 medium-severity items from Hamilton's system-wide audit. No findings remain.

What changed

1. Pool acquire() re-entrance restructured (src/informix_db/pool.py):

  • The growth path used self._lock._is_owned() (a CPython-private API) inside a try/finally to handle the lock state after the slow connect call. Hamilton flagged this as fragile across CPython versions.
  • Restructured to use two explicit re-acquire calls (one in the success path, one in the exception path) with no shared finally clause. No reliance on private APIs; the control flow is also clearer.

2. Login rejection diagnostics (src/informix_db/connections.py):

  • _raise_from_rejection previously always raised generic OperationalError("server rejected the connection") with no diagnostic context. Wrong-password and wrong-database produced identical errors.
  • Added _extract_server_error_text() helper that pulls the longest printable-ASCII run (8-256 chars) from the rejection payload. The server's human-readable error string is typically embedded somewhere in there; surfacing it gives users enough context to diagnose login failures without the full structured decode (deferred — version-dependent).
  • Falls back to a hex preview of the rejection payload's first 64 bytes for forensic logging when no printable string is found.

3. _send_exit exception handling broadened (src/informix_db/connections.py):

  • Previously caught a specific tuple (OperationalError, InterfaceError, OSError, ProtocolError). Any unexpected error (struct.error from a malformed ack byte, future protocol-parse logic bug) would have escaped from _send_exitConnection.close() and left a half-closed socket.
  • Broadened to bare except Exception since _send_exit is best-effort by definition (we're already tearing down). The actual socket FD is freed in Connection.close()'s finally clause via self._sock.close() (idempotent, never-raising per _socket.IfxSocket.close's contract).

Tests

5 new unit tests in tests/test_protocol.py covering _extract_server_error_text edge cases:

  • Finds the longest printable run from a binary-with-text payload
  • Picks the longest of multiple runs
  • Returns None for runs under 8 chars
  • Handles empty input
  • Caps at 256 chars (avoids matching binary blocks misinterpreted as text)

Total: 77 unit + 231 integration + 28 benchmark = 336 tests.

Hamilton audit punch list — final state

Finding Phase Status
Critical #1 (dirty pool checkout) 26 Fixed
Critical #2 (wire lock) 27 Fixed
High #3 (async cancellation eviction) 27 Fixed
High #4 (bare-except in error drain) 28 Fixed
High #5 (cursor finalizers) 28+29 Fixed (28: finalizer; 29: deferred-cleanup queue)
Medium: BLOB_PLACEHOLDER collision 28 Fixed
Medium: parse_tuple bounds 28 Documented non-fix (benign over-read; per-branch checks deferred until needed)
Medium: pool acquire re-entrance 30 Fixed
Medium: login error specificity 30 Fixed
Medium: _send_exit clean error handling 30 Fixed

0 critical, 0 high, 0 unfixed mediums. The driver has fully addressed every actionable item from the system-wide audit.

Hamilton verdict trajectory

Phase Verdict
Phase 21 era (no audit yet)
System audit (pre-26) PRODUCTION READY WITH CAVEATS — 2 critical, 3 high, 5 medium
Post-26 CAVEATS NARROWED — 1 critical, 3 high
Post-27 0 critical, 2 high
Post-28 0 critical, 0 high, 4 medium
Post-29 (Phase 28's High #5 leak gap closed)
Post-30 0 critical, 0 high, 0 medium — PRODUCTION READY

2026.05.05.3 — Deferred-cleanup queue (Phase 29)

Closes the unbounded-leak gap on long-lived pooled connections that Phase 28 created when the cursor finalizer's wire-lock-busy path "leaked + logged". The leak was bounded by session lifetime, not by GC frequency — a long-lived pooled connection seeing many cancellation events could accumulate orphaned server-side cursors until IDS's per-session cursor limit. Phase 29 closes that gap.

What changed

1. Per-connection deferred-cleanup queue (src/informix_db/connections.py):

  • Added Connection._pending_cleanup: list[bytes] and Connection._cleanup_lock. Two locks now: the existing _wire_lock (send/recv atomicity) and a new _cleanup_lock (tiny critical section guarding only the list mutation).
  • New _enqueue_cleanup(pdus) — append-and-return. Safe to call from any thread, including a finalizer on a thread that doesn't own the wire lock. Holds _cleanup_lock only for the extend call.
  • New _drain_pending_cleanup() — pop-the-list + send-each-PDU. Caller must hold _wire_lock (every actual call site does, via _send_pdu). On wire desync mid-drain, force-closes the connection — same doctrine as _raise_sq_err and the cursor finalizer.
  • _send_pdu now opportunistically drains the queue before sending the new PDU. The drain runs under the wire lock the caller already holds, so queued cleanup completes atomically before the next op.

2. Cursor finalizer enqueues instead of leaking (src/informix_db/cursors.py):

  • When _wire_lock.acquire(blocking=False) fails (cross-thread GC during another thread's wire op), the finalizer now calls conn._enqueue_cleanup([_CLOSE_PDU, _RELEASE_PDU]). The next normal operation drains them.
  • WARNING-level "leak accumulating" log demoted back to DEBUG since the leak no longer accumulates — it just defers.

Why two locks

_wire_lock is held for the full duration of a wire round-trip (send + drain). That's potentially milliseconds. The finalizer's enqueue path needs to synchronize with normal ops without blocking on a wire round-trip — otherwise GC time would grow proportional to query time. So _cleanup_lock is a separate, much shorter critical section that only guards the list mutation. Lock-acquire order: never acquire _cleanup_lock while holding _wire_lock recursively — the drain copies-and-clears under _cleanup_lock, then iterates under _wire_lock (which the caller already holds).

Tests

Two new regression tests in tests/test_pool.py:

  • test_enqueue_cleanup_drains_on_next_send_pdu — verifies the queue mechanism: enqueue a PDU, call drain directly, confirm the queue is empty.
  • test_pending_cleanup_thread_safe_enqueue — 8 threads × 50 enqueues each; verifies all 400 entries land (no race-loss from extend non-atomicity).

Total: 72 unit + 231 integration + 28 benchmark = 331 tests.

Impact on the audit punch list

Hamilton finding Phase Status
Critical #1 (dirty pool checkout) 26 Fixed
Critical #2 (wire lock) 27 Fixed
High #3 (async cancellation eviction) 27 Fixed
High #4 (bare-except in error drain) 28 Fixed
High #5 (cursor finalizers) 28+29 Fixed completely (28: finalizer; 29: bounded-leak fallback)
Medium: BLOB_PLACEHOLDER collision 28 Fixed
Medium: parse_tuple bounds (investigated) 28 Documented non-fix (benign)
Medium: pool acquire re-entrance 30 (next phase)
Medium: login error specificity 30 (next phase)
Medium: _send_exit clean error handling 30 (next phase)

2026.05.05.2 — Resource leak hardening (Phase 28)

Closes Hamilton audit High #4 (bare-except in error drain) and High #5 (no cursor finalizers), plus 1 medium one-liner. After Phases 2628 all CRITICAL and HIGH audit findings are fixed; remaining items are 4 mediums (one-liners with low blast radius).

What changed

1. Cursor finalizers (src/informix_db/cursors.py):

  • Cursor.__init__ now registers a weakref.finalize-based callback that releases server-side resources (CLOSE + RELEASE) if the cursor is garbage-collected without explicit close(). Previously, a mid-fetch raise (MemoryError, user code error in for row in cursor:, etc.) would orphan the prepared statement / scrollable cursor handle on the server.
  • The finalizer uses non-blocking lock acquire: cross-thread GC (cyclic GC, weakref callback delivery) cannot deadlock against a thread holding the wire lock. If the lock is busy, the cleanup is skipped and a WARNING is logged so leak accumulation is visible on long-lived pooled connections.
  • Pre-built static _CLOSE_PDU and _RELEASE_PDU bytes at module load — finalizers must not allocate or call cursor methods (the cursor is mid-GC).
  • A state = [False] list pattern keeps the finalizer's closure weak (the cursor itself isn't captured); cursor mutates state[0] = True when opening server-side resources, False on explicit close(). Documented GIL-dependence for the atomic mutation.

2. _raise_sq_err drain hardened (both cursors.py and connections.py):

  • Replaced bare except: pass with specific (ProtocolError, OSError) catches for the near-token parse and drain loop.
  • On drain failure, force-close the connection (set _closed = True, close socket). The wire is unrecoverable after a desync; subsequent operations get a clean InterfaceError rather than inheriting silent corruption.
  • Same doctrine applies in the cursor finalizer (after Hamilton review): wire desync → force-close, not silent swallow.

3. BLOB_PLACEHOLDER validation (cursors.py):

  • write_blob_column now validates the placeholder appears EXACTLY once. Pre-Phase-28, str.replace would silently substitute every occurrence — corrupting any SQL that legitimately contained the literal string in a comment or other position. Now raises ProgrammingError with a workaround pointer.

4. parse_tuple_payload bounds-check INVESTIGATED, NOT FIXED:

  • Added end-of-loop bounds check; broke 10 BLOB/CLOB tests due to a long-standing off-by-one in the UDTVAR(lvarchar) trailing-pad logic.
  • Concluded the over-read is benign: payload is a fully-extracted bytes object, so over-reads return empty slices that flow through unused branches (the UDTVAR pad isn't decoded). Real silent-corruption surfaces are localized to length-prefix decoders, requiring branch-local checks rather than a loop-global assertion.
  • Reverted the check; documented the analysis as a deliberate non-fix in the source.

Margaret Hamilton review pass

Three Hamilton reviews shaped this phase. The Phase 28 review surfaced two blocking conditions, both addressed before tagging:

  • Asymmetric failure handling: my _raise_sq_err fix force-closed the connection on (ProtocolError, OSError), but the cursor finalizer's except Exception silently swallowed the same failures on the same wire. Same wire, same failure mode, same response. Fixed: finalizer now catches (ProtocolError, OSError) specifically, force-closes the connection, logs at WARNING. Asymmetry eliminated.
  • Leak visibility: the wire-lock-busy log was at DEBUG. Promoted to WARNING — leak accumulation on long-lived pooled connections must be visible to anyone watching their app logs.

Plus three documentation improvements applied:

  • GIL dependency of the list-of-bool atomic-mutation pattern noted at the registration site.
  • OperationalError inclusion in the desync catch tuple in connections.py documented (it can be raised by _drain_to_eot for unknown tags during drain).
  • parse_tuple_payload non-fix documented inline so future maintainers don't re-derive the analysis.

Known follow-up (Phase 29)

Hamilton flagged unbounded leak accumulation on pooled connections: when the wire lock is busy at GC time, the resource leaks until session close. On a long-lived pooled connection across many cancellation events, the count can approach IDS's per-session cursor limit. The fix is a deferred-cleanup queue drained at the next _send_pdu on the connection — opportunistic best-effort cleanup. Tracked for Phase 29; not blocking Phase 28.

Tests

One new regression test: tests/test_smart_lob_write.py::test_write_blob_column_rejects_multiple_placeholders — confirms BLOB_PLACEHOLDER count > 1 raises ProgrammingError with a workaround pointer.

Total: 72 unit + 229 integration + 28 benchmark = 329 tests.

Hamilton audit verdict trajectory

Phase Critical High Medium
Pre-26 2 3 5
Post-26 1 3 5
Post-27 0 2 5
Post-28 0 0 4

No CRITICAL or HIGH findings remain. The four remaining mediums are diagnostic / cosmetic (login error specificity, _send_exit clean error handling, etc.). The driver is PRODUCTION READY with the Phase 29 deferred-cleanup queue as a future hardening step rather than a blocker.

2026.05.05.1 — Wire lock + async cancellation eviction (Phase 27)

Closes Hamilton audit findings Critical #2 (concurrency / wire lock) and High #3 (async cancellation evicts cleanly). Phase 26 fixed what gets returned to the pool; Phase 27 fixes what can interleave on the wire while it's running.

What changed

1. Per-connection wire lock (src/informix_db/connections.py):

  • Added Connection._wire_lock = threading.RLock(). Wrapped commit(), rollback(), and fast_path_call() in with self._wire_lock:.
  • _ensure_transaction() documents the lock as a precondition and asserts ownership (self._wire_lock._is_owned()) — a future caller adding a third call site fails loudly in tests rather than corrupting wire state in production.
  • Connection.close() now tries to acquire the wire lock with a 0.5s timeout before sending SQ_EXIT. If another thread is mid-operation, skip the polite exit and force-close the socket; the in-flight thread observes EOF on its next read.
  • RLock (not Lock) because pool.release() holds the lock with timeout, then calls conn.rollback() which itself acquires.

2. Cursor wire methods locked (src/informix_db/cursors.py):

  • Cursor.execute() body extracted into _execute_under_wire_lock() and called under the lock.
  • Cursor.executemany() body wrapped inline.
  • Cursor._sfetch_at() (the SQ_SFETCH primitive used by every scrollable fetch_* method) wrapped — every scrollable cursor op gets the lock for free.
  • Cursor.close() acquires the lock for the CLOSE+RELEASE on scrollable cursors.
  • read_blob_column and write_blob_column inherit through their internal self.execute() calls.

3. Pool release with timeout-acquire (src/informix_db/pool.py):

  • release() now acquires conn._wire_lock with a _RELEASE_WIRE_LOCK_TIMEOUT = 5.0 budget before rolling back. If a still-running worker thread holds the lock past 5s, the connection is evicted instead of recycled. Logged at WARNING level via the Phase 26 logger.

4. Async cancellation → eviction (src/informix_db/aio.py):

  • AsyncConnectionPool.connection() now catches (asyncio.CancelledError, asyncio.TimeoutError) separately and routes them to broken=True. Combined with the wire lock, this means asyncio.wait_for around aio DB calls is now safe — the connection is either successfully released (worker finished in time) or evicted (worker exceeded the timeout); never returned to the pool in a poisoned state.
  • Removed the Phase 26 cancellation warning from the docstring; now describes the new safety guarantee explicitly.
  • Mirrored in docs/USAGE.md async section.

Margaret Hamilton review pass

Two passes (Phase 26 review + system-wide audit) had already shaped this design. The Phase 27 review surfaced three actionable conditions:

  • Test reliability: the cancellation regression test used contextlib.suppress(asyncio.TimeoutError) — silently passing if the timeout never fired (e.g., on a fast CI runner where the query completes within 1ms). Fixed: switched to pytest.raises(asyncio.TimeoutError) so the test fails if the cancellation path isn't actually exercised.
  • Defensive guard for _ensure_transaction: documented "caller must hold the wire lock" as a precondition, but no runtime check. Fixed: added assert self._wire_lock._is_owned() so a future caller forgetting to lock fails loudly in tests.
  • Symmetry in Connection.close(): the polite SQ_EXIT was unsynchronized — could interleave with another thread's PDU. Fixed: try-acquire with 0.5s timeout; if busy, skip SQ_EXIT and force-close.

Plus one cross-phase note: Phase 27 makes Hamilton's High #5 (cursor finalizers) more visible, because cross-thread __del__ invocation could deadlock on the wire lock. Tracked for Phase 28; Phase 27 doesn't introduce the underlying hazard.

Tests

Two new regression tests in tests/test_pool.py:

  • test_concurrent_threads_on_one_connection_dont_interleave_pdus — two threads each running 20 distinct queries on a shared Connection. Without the wire lock, PDU interleaving causes wrong results, ProtocolError, or hangs. With the lock, both threads complete with correct results.
  • test_async_wait_for_cancellation_evicts_connection — spawns a slow query under asyncio.wait_for(timeout=0.001), asserts (via pytest.raises) that the timeout actually fires, then verifies pool size shrinks (connection evicted, not returned to idle).

Total: 72 unit + 228 integration + 28 benchmark = 328 tests.

Hamilton verdict trajectory

Audit pass Verdict
Phase 21 era (no audit yet)
System-wide audit (pre-Phase 26) PRODUCTION READY WITH CAVEATS — 2 critical, 3 high
Post-Phase 26 CAVEATS NARROWED — 1 critical, 3 high
Post-Phase 27 CAVEATS NARROWED FURTHER — 0 critical, 2 high

Remaining audit items (deferred to Phase 28):

  • High #4: bare except: pass in _raise_sq_err drain
  • High #5: no cursor finalizers (server-side resource leak on mid-fetch raise)
  • Plus 5 medium one-liners

2026.05.05 — Pool rollback-on-release (Phase 26): CRITICAL data-correctness fix

Fixes the dirty-pool-checkout bug surfaced by Margaret Hamilton's system-wide audit. This is the most important fix in the project's history so far — it eliminates a class of silent data-correctness failures that affect any application using both the connection pool and non-autocommit transactions.

The bug

Before this fix, when a request released a connection to the pool with an open server-side transaction (no explicit commit() or rollback()), the connection went back to the idle list with the transaction still active. The next request to acquire that connection would inherit it. Specifically:

  • Request A does INSERT (no commit) → returns 200 to user → context manager calls pool.release() → connection rejoins _idle with A's open transaction.
  • Request B acquires the same connection → its first DML runs inside A's transaction. If B commits, A's writes land permanently. If B errors before commit, A's writes are silently rolled back.

This is the same shape as the bug psycopg2 fixed years ago. Hamilton ranked it Critical #1 in the audit.

The fix

ConnectionPool.release() now rolls back any uncommitted transaction before the connection rejoins the idle list. If the rollback itself fails (dead socket, etc.), the connection is evicted instead of recycled — half-state never returns to the pool.

The rollback runs outside the pool lock (it's a wire round-trip; we don't want to block other pool operations). The connection is safe to operate on alone because it's not yet in _idle and _total already counts it as owned.

Async path covered automatically

AsyncConnectionPool.release() delegates to the sync pool's release via _to_thread, so async users get the fix transparently.

Margaret Hamilton review pass

The fix went through a focused review. Two findings addressed before tagging:

  • Silent failure on rollback exception: Hamilton flagged that except Exception: swallowed rollback failures invisibly. Added a WARNING-level log via logging.getLogger("informix_db.pool") so evictions are debuggable. (No existing logging convention in the codebase before now; this is the first.)
  • Async cancellation interaction: the fix doesn't introduce the asyncio.wait_for race (that's Critical #2 from the audit, deferred to Phase 27), but it adds a code path that can trigger it. Documented loudly in pool.release()'s docstring, in aio.py's module docstring, and in USAGE.md's async section. The recommendation: use read_timeout on the connection instead of asyncio.wait_for until Phase 27 lands the per-connection wire lock.

Tests

Two new integration tests in tests/test_pool.py:

  • test_uncommitted_writes_invisible_to_next_acquirer — the regression test. Pool with max_size=1 forces A and B to share the connection. A inserts without committing, A's context exits, B acquires (gets the same connection), B verifies no rows visible. Verified by stashing the fix that the test fails with "B sees 1 rows — leaked across pool checkout boundary" — confirming the test catches the actual bug, not a false positive.
  • test_committed_writes_survive_pool_checkout — counterpart that proves the fix doesn't over-correct (committed writes still persist).

Total tests: 72 unit + 226 integration + 28 benchmark = 326.

What this fix does NOT cover (deferred to Phase 27)

  • Concurrency / per-connection wire lock (Hamilton Critical #2). Connection._lock is currently held only by close(); every wire-touching method runs unsynchronized. Two threads on the same connection (or async cancellation leaving a worker thread still running) can interleave PDU bytes. The Phase 26 fix doesn't make this worse than baseline, but it adds a second path that can trigger it.
  • No finalizers on cursors (Hamilton High #5). Mid-fetch raises still leak server-side prepared statements / scrollable cursors.
  • Bare except: pass in _raise_sq_err drain (Hamilton High #4). Masks ProtocolError during error decode.
  • Async cancellation evicts cleanly (Hamilton High #3). Currently TimeoutError/CancelledError aren't routed to broken=True — connection rejoins pool. Phase 26 mitigates the data corruption from this case (rollback runs on release), but the wire-desync race remains until Phase 27.

These are real and known. Documented in the audit punch list.

2026.05.04.10 — Branch reorder by frequency + invariant tripwires (Phase 25)

Third-pass optimization on parse_tuple_payload. Previous phases removed redundant work; this one removes correct-but-wasteful work: the if/elif chain checked branches in implementation order, not frequency order. Fixed-width types (INT, FLOAT, DATE, BIGINT — by far the most common columns in real queries) sat at the bottom of the chain, paying ~7 frozenset/equality misses per column.

What changed

  • Added _FIXED_WIDTH_TYPES = frozenset(FIXED_WIDTHS.keys()) at module load in _resultset.py.
  • New fast-path branch at the TOP of parse_tuple_payload's loop body that handles every _FIXED_WIDTH_TYPES column inline (slice + _decode_base + advance). For an INT column we now hit one frozenset check, one dict lookup, one decode call — and skip every other branch.
  • Cleaned up the bottom fall-through since FIXED_WIDTHS-keyed types no longer reach it. The fall-through now genuinely only catches unknown/unhandled types; comment updated.

Margaret Hamilton review pass — invariant tripwires added

The third Hamilton review of this hot path produced one HIGH-severity finding addressed before tagging. The pattern was the same as Phases 23 and 24: an optimization is correct because of a property of an external table (here: FIXED_WIDTHS keys are decodable without qualifier inspection), but the property is implicit. The finding's recommendation, going beyond a comment:

  • Added tests/test_resultset_invariants.py — three CI tripwire tests that turn the structural invariants from comments into executable checks:
    1. _FIXED_WIDTH_TYPES is disjoint from every other dispatch branch's type set.
    2. Every FIXED_WIDTHS key has a decoder in DECODERS.
    3. All DECODERS keys are < 0x100 (the Phase 24 collision-free guarantee).
  • Added INVARIANT comment to FIXED_WIDTHS in converters.py explaining the qualifier-free constraint and pointing to the tripwire tests.

The tests follow a simple discipline: if one fires, don't update the test to match the new state — read the docstring and either restore the property or refactor the optimization to no longer depend on it. Comments rot when nobody reads them; tests fail loudly when someone violates them.

Performance summary (Phase 25)

Benchmark Phase 24 baseline NOW Δ
parse_tuple_5cols_iso8859 1659 ns 1400 ns -16%
parse_tuple_5cols_utf8 1649 ns 1341 ns -19%

End-to-end SELECT numbers fluctuate ±10% run-to-run on sub-millisecond loopback round-trips; the codec micro-benchmark is the durable measurement.

Cumulative improvement (vs. original Phase 21 baseline, before any optimization)

Metric Original NOW Total Δ
parse_tuple_5cols 2796 ns 1400 ns -50%
decode_int 230 ns 139 ns -40%
select_bench_table_all (1k rows, where measurable) 1477 µs ~990 µs ≈-33%

The per-row decode hot path is half the time it took at start of optimization work. Real-world fetch ceiling: 358K rows/sec → ~715K rows/sec on a single connection.

Tests

3 new unit tests (the invariant tripwires). Total: 72 unit + 224 integration + 28 benchmark = 324 tests.

Baseline refreshed

tests/benchmarks/baseline.json updated. All tests pass; ruff clean.

2026.05.04.9 — Decoder dispatch + struct precompilation (Phase 24)

Second pass of hot-path optimization. Phase 23 lifted IfxType conversions out of the loop body in _resultset.py (-26% on parse_tuple_5cols). Phase 24 goes deeper into the codec layer.

What changed

1. Split decode() into public + internal in src/informix_db/converters.py.

  • New _decode_base(base_tc, raw, encoding) takes an already-base-typed type code and skips the base_type() flag strip. Documented INVARIANT: caller's responsibility to base-type the input.
  • Public decode() is now a one-line wrapper: return _decode_base(base_type(type_code), raw, encoding). Same external semantics, same backward-compat — _fastpath.py:171 is unaffected.
  • parse_tuple_payload (4 call sites) now imports and calls _decode_base directly. Saves ~100 ns × N columns per row by skipping the redundant flag strip.

2. Pre-compiled struct.Struct unpackers. The fixed-width integer/float decoders (_decode_smallint, _decode_int, _decode_bigint, _decode_smfloat, _decode_float, _decode_date) switched from per-call struct.unpack(fmt, raw) to module-level bound methods like _UNPACK_INT = struct.Struct("!i").unpack. Format-string parsing happens once at module load instead of per call — measured 37% faster than per-call struct.unpack on a CPython 3.13 microbenchmark.

Margaret Hamilton review pass

The optimization went through a second failure-mode review. One HIGH-severity finding addressed:

  • H (high): The no-collision guarantee that makes _decode_base safe is structural but undocumented. Specifically: all DECODERS keys are ≤ 0xFF; all flag bits in _types.py are ≥ 0x100; therefore a flagged input cannot coincidentally match a DECODERS key. This guarantee is correct today but fragile — adding a decoder for a type code that uses bits ≥ 0x100 would silently weaken it. Fixed: added a load-bearing INVARIANT comment at the DECODERS dict declaration explaining the constraint and what to do if it's violated. Cross-referenced from _decode_base's docstring so the contract is bidirectionally traceable.

Performance summary (Phase 24)

Benchmark Phase 23 baseline NOW Δ this phase
decode_int 173 ns 139 ns -20%
decode_bigint 188 ns 150 ns -20%
decode_smallint 169 ns 137 ns -19%
decode_date 521 ns 435 ns -17%
parse_tuple_5cols_iso8859 2047 ns 1592 ns -22%
select_bench_table_all (1k rows) 1255 µs 989 µs -21%
select_with_param 977 µs 860 µs -12%

Cumulative improvement (vs. original Phase 21 baseline, before any optimization)

Metric Original NOW Total Δ
decode_int 230 ns 139 ns -40%
parse_tuple_5cols 2796 ns 1592 ns -43%
select_bench_table_all (1k rows) 1477 µs 989 µs -33%

Real-world fetch ceiling: 358K rows/sec → ~620K rows/sec on a single connection.

Baseline refreshed

tests/benchmarks/baseline.json updated. All 224 integration tests pass; ruff clean.

2026.05.04.8 — Hot-path optimization (Phase 23)

Optimized parse_tuple_payload — the per-row decode function hit by every SELECT result set. The 1k-row fetch wall-clock improved 19% (1477 µs → 1198 µs). Bench micro-target (parse_tuple_5cols) improved 27% (2796 ns → 2030 ns). All 224 integration tests still pass; ruff clean.

What changed (src/informix_db/_resultset.py)

  • Removed redundant base_type() call from the hot loop. ColumnInfo.type_code is already base-typed by parse_describe at construction — calling base_type(col.type_code) again per column per row was pure waste. This was the single largest savings.
  • Lifted int(IfxType.X) to module-level constants (_TC_CHAR, _TC_VARCHAR, etc.). Original code did the IntFlag→int conversion inline ~10 times per loop iteration; now done once at module import.
  • Moved lazy imports to module top (_decode_datetime, _decode_interval, BlobLocator, ClobLocator, RowValue, CollectionValue). Saves a per-call attribute lookup; verified no circular import risk.
  • Three precomputed frozensets (_LENGTH_PREFIXED_SHORT_TYPES, _COMPOSITE_UDT_TYPES, _NUMERIC_TYPES) replace inline tuple-membership checks.
  • _COLLECTION_KIND_MAP wrapped in MappingProxyType — actually frozen against accidental mutation, not just nominally.

Margaret Hamilton review pass

The optimization went through a rigorous failure-mode review. Findings addressed before tagging:

  • H1 (high): cursor._dereference_blob_columns (line 304-310) was doing the same redundant base_type() call. Stripped for consistency — otherwise the next reader would write a "fix" to one site or the other based on which they noticed.
  • M1 (medium): documented the load-bearing invariant at its single producer site. parse_describe now has a comment naming readers that depend on ColumnInfo.type_code being base-typed, so a future contributor adding a new construct site has a grep-able warning.
  • M2 (medium): _COLLECTION_KIND_MAP is now MappingProxyType (was a plain dict).
  • L1 (low): stale "(line 151)" comment reference replaced with a pointer to the named INVARIANT comment.

Performance summary

Benchmark Pre Post Delta
parse_tuple_5cols_iso8859 2796 ns 2030 ns -27%
parse_tuple_5cols_utf8 2791 ns 2041 ns -27%
select_bench_table_all (1k rows) 1477 µs 1198 µs -19%
select_with_param (~50 rows) 1069 µs 994 µs -7%
Codec micro-benchmarks (decode_int, etc.) unchanged ±noise
cold_connect_disconnect unchanged
executemany series unchanged

Real-world fetch ceiling on a single connection: 350K rows/sec → 490K rows/sec.

Baseline refreshed

tests/benchmarks/baseline.json updated with the new (faster) numbers. Future regressions will be measured against this floor.

2026.05.04.7 — User-facing documentation refresh (Phase 22)

The docs/USAGE.md predated Phases 17-21, so anyone landing on PyPI was missing scrollable cursors, locale/Unicode, the autocommit cliff finding, and the type-mapping reference. This release closes that gap.

Added (in docs/USAGE.md)

  • Locale and Unicode — full section on client_locale, Connection.encoding, the CLIENT_LOCALE vs DB_LOCALE distinction, what happens when characters can't fit the codec, how to create a UTF-8 database. Bridges the gap between Phase 20's plumbing and a user's first multibyte INSERT.
  • Type mapping reference — full SQL ↔ Python type table covering integer widths, DECIMAL, all string types, DATE/DATETIME/INTERVAL, BYTE/TEXT, BLOB/CLOB, ROW/COLLECTION, and NULL. Plus subsections on NULL sentinels and IntervalYM.
  • Performance tips — three numbered patterns: wrap bulk INSERTs in a transaction (53× speedup), use executemany not a loop (≈100× speedup), use a connection pool (72× speedup over cold connect). Quotes the actual benchmark numbers from Phase 21.1.
  • Scrollable cursorsfetch_first / fetch_last / fetch_prior / fetch_absolute / fetch_relative / scroll() API; in-memory vs cursor(scrollable=True) server-side trade-offs; edge cases (past-end semantics, negative indexing, rownumber indexing).
  • Timeouts and keepalive subsection — connect_timeout / read_timeout / keepalive semantics with a "reasonable production starting point" recommendation.
  • Environment dictionary subsection — the env={} parameter, with examples (OPT_GOAL, OPTOFC, IFX_AUTOFREE).
  • Known limitations — explicit table of what doesn't work yet (named parameters, complex UDT bind, GSSAPI, XA, listener failover, etc.) with workarounds where they exist. Plus "things that work but might surprise you" (autocommit default, no-op commit on unlogged DB, SERIAL retrieval).

Changed

  • README.md — added a "Documentation" section linking to docs/USAGE.md and tests/benchmarks/README.md. Bumped phase count.

Doc corrections caught during review

  • cursor.rownumber is 0-indexed, not 1-indexed (the implementation has been correct; only the original docstring wording was loose).
  • fetch_* methods work on both scrollable=True and the default (in-memory) cursor — the original Phase 17 docs implied scrollable=True was required, but the in-memory path supports them too.

2026.05.04.6 — executemany perf finding: it was the autocommit cliff

Investigation of the Phase 21 finding that executemany(N) cost scaled linearly per-row (1.74 ms × N) regardless of batch size. Root cause: every autocommit-True INSERT forces a server-side transaction-log flush. Not a wire-protocol bug.

Added

  • test_executemany_1000_rows_in_txn benchmark — same workload, but inside a single transaction with one COMMIT at the end. Isolates pure protocol cost from server-storage cost.
  • New module-scoped txn_conn fixture in tests/benchmarks/test_insert_perf.py for autocommit-False benchmarks.

Findings

Mode Total Per row
executemany(1000) autocommit=True 1.72 s 1.72 ms
executemany(1000) in single txn 32 ms 32 µs

53× speedup from changing the transaction boundary, not the driver. Pure protocol overhead is ~32 µs/row → ~31,000 rows/sec sustained throughput on a single connection. Comparable to mature pure-Python drivers (pg8000).

Changed

  • tests/benchmarks/README.md — updated headline numbers to show both modes, added a "Performance gotchas" section explaining when to use autocommit=False for bulk loads.
  • tests/benchmarks/baseline.json — refreshed to include the new txn-mode measurement (now 29 entries, was 28).

Decision: don't pipeline

Pipelining BIND+EXECUTE PDUs (writing N without waiting for responses between them) could potentially halve the 32 µs/row figure on loopback. Decided against:

  • The remaining 32 µs is already excellent — single-connection bulk-load performance is not where users hit limits.
  • Pipelining adds complexity around TCP send-buffer management, partial-failure semantics, and error reporting (which row failed when 50 are in flight).
  • The autocommit gotcha is the real user-facing footgun. Better docs > more code.

If someone reports needing >31K rows/sec single-connection, this becomes Phase 22 work.

2026.05.04.5 — Performance benchmarks (Phase 21)

Adds tests/benchmarks/ — a pytest-benchmark driven suite covering codec micro-benchmarks (no server required) and end-to-end SELECT/INSERT/pool/async benchmarks. Establishes a committed baseline.json so future PRs can be compared against the floor and regressions caught at review.

Added

  • tests/benchmarks/test_codec_perf.py — 16 micro-benchmarks for the hot codec paths (decode, encode_param, parse_tuple_payload). Run without an Informix container; suitable for pre-merge CI.
  • tests/benchmarks/test_select_perf.py — 4 SELECT round-trip benchmarks: 1-row latency floor, ~10 rows, full 1k-row table, parameterized.
  • tests/benchmarks/test_insert_perf.py — 3 INSERT benchmarks: single-row, executemany(100), executemany(1000).
  • tests/benchmarks/test_pool_perf.py — 3 pool benchmarks: cold connect (login handshake cost), pool acquire/release, pool acquire + tiny query + release.
  • tests/benchmarks/test_async_perf.py — 2 async benchmarks: single async round-trip overhead, 10 concurrent SELECTs through an async pool.
  • tests/benchmarks/conftest.pybench_conn (long-lived autocommit connection) and bench_table (pre-populated 1k-row table) fixtures, both session-scoped.
  • tests/benchmarks/baseline.json — committed baseline (28 measurements) for --benchmark-compare regression checks.
  • tests/benchmarks/README.md — headline numbers, regression policy, how to update baseline, what each benchmark measures.
  • make bench / make bench-codec / make bench-save Makefile targets.
  • benchmark pytest marker — gated, off by default. pytest -m benchmark to opt in.

Changed

  • make test-integration now uses -m "integration and not benchmark" so the integration suite stays fast (~6s) — benchmarks (~27s) are gated behind make bench.
  • pytest default -m now excludes both integration and benchmark. Default run is unit-only.

Headline numbers (dev container, x86_64 Linux, loopback)

Operation Mean
decode(int) (per cell) 181 ns
parse_tuple_payload(5 cols) (per row) 2.87 µs
SELECT 1 round-trip 177 µs
Pool acquire + tiny query + release 295 µs
Cold connect + close 11.2 ms

Pool-vs-cold delta is 72×. UTF-8 decode carries no measurable cost over iso-8859-1 (Phase 20 didn't slow anything down).

Tests

28 new benchmark tests. Total: 69 unit + 211 integration + 28 benchmark = 308.

2026.05.04.4 — UTF-8 / multibyte locale support

Threads the connection's CLIENT_LOCALE through to user-data string codecs so multibyte locales (UTF-8, etc.) round-trip correctly. The driver previously hardcoded iso-8859-1 for every string conversion — fine for Western European text, broken-by-design for CJK, Cyrillic, Arabic, emoji.

Added

  • Connection.encoding property — reports the Python codec name derived from CLIENT_LOCALE (e.g., iso-8859-1, utf-8, iso-8859-15). Default for a connection without client_locale= is iso-8859-1 (compatible with the legacy default).

  • informix_db.connections._python_encoding_from_locale(locale: str) — maps Informix locale strings (en_US.utf8, en_US.8859-1, en_US.819) to Python codec names. Falls back to iso-8859-1 for unknown / unsuffixed forms.

Changed

  • encode_param(value, encoding=...) and _encode_str(value, encoding=...) honor the connection's encoding instead of hardcoded iso-8859-1. Cursor's _emit_bind_params forwards self._conn.encoding per parameter.

  • decode(type_code, raw, encoding=...) and parse_tuple_payload(reader, columns, encoding=...) thread the encoding to string column decoders (CHAR, VARCHAR, NCHAR, NVCHAR, LVARCHAR). Cursor's _read_fetch_response forwards self._conn.encoding.

  • Smart-LOB CLOB encode/decode (write_blob_column, simple-LOB TEXT fetch) honor self._conn.encoding.

  • Fast-path RPC (Connection.fast_path_call) honors self._encoding for its bound parameters.

Boundary discipline

Protocol-level strings stay iso-8859-1 (always ASCII, never user-controlled): cursor names, function signatures, server-fabricated SQ_FILE virtual filenames, error "near tokens", SQL keywords/identifiers. Only user-data strings (column values, parameter binds) follow CLIENT_LOCALE.

Error handling

Encoding-can't-represent-this-value (e.g., "你好" on an 8859-1 connection) now raises informix_db.DataError instead of letting Python's UnicodeEncodeError leak. The cursor releases the prepared statement before propagating, so the connection survives cleanly for the next query.

Tests

9 new integration tests in tests/test_unicode.py:

  • ASCII round-trip (regression)
  • Latin-1 high-bit chars round-trip on default locale
  • Full byte range 0x20-0xFE round-trip via VARCHAR
  • Locale → Python codec mapping for common forms
  • Connection.encoding exposes the resolved codec
  • UTF-8 locale negotiation (server transcodes for ASCII even with 8859-1 DB)
  • UTF-8 multibyte round-trip (skipped without IFX_UTF8_DATABASE env var pointing to a UTF-8 database)
  • Non-representable char raises DataError cleanly; connection survives
  • CLOB column round-trips Latin-1 text honoring connection encoding

Total: 69 unit + 212 integration = 281 tests.

Limitations

  • Multibyte UTF-8 storage requires both client_locale='en_US.utf8' AND a database whose DB_LOCALE is UTF-8. The dev container's testdb is 8859-1, so storing CJK chars there will continue to fail server-side regardless of the client codec. The test_utf8_multibyte_round_trip test is gated on the IFX_UTF8_DATABASE env var pointing to a UTF-8 database.

2026.05.04.3 — Resilience tests (fault injection)

Added

  • tests/_proxy.pyControlledProxy helper: a thread-based TCP forwarder between the test client and Informix, with a kill() method that sends TCP RST (via SO_LINGER=0) to simulate a network drop or server crash. Used as a context manager.

  • tests/test_resilience.py — 12 integration tests filling the resilience gap identified in the test-coverage audit:

    • Network drop mid-SELECT raises OperationalError cleanly (not hang)
    • Network drop after describe but before fetch
    • Network drop during fetch iteration (already-materialized rows still readable, fresh execute fails)
    • Local socket close (yank-the-rug from client side)
    • I/O error marks connection unusable
    • Pool evicts a connection that died mid-with block
    • Pool revives after all idle connections died (health-check on acquire mints fresh)
    • Async cancellation via asyncio.wait_for — pool stays usable for subsequent queries
    • Cursor reusable after SQL error
    • Connection survives cursor close after error
    • Pool sustained-load smoke (50 acquire/release cycles, no leak)
    • read_timeout fires on a hung connection

What this catches

  • Hangs (waiting forever on a dead socket)
  • Silent data corruption (treating EOF as a valid tuple)
  • Double-fault (one error → cleanup raises a different error)
  • Pool poisoning (returning a broken connection to the pool)
  • Stale cursor reuse (same cursor reused across an error boundary)

Tests

12 new integration tests. Total: 69 unit + 203 integration = 272 tests.

The Phase 19 work fills the highest-priority gap from the test-adequacy audit. Remaining gaps from that audit (UTF-8 locale, server-version matrix, performance benchmarks) are real but lower-severity.

2026.05.04.2 — Server-side scrollable cursors

Added

  • Server-side scrollable cursors (Phase 18): opt in via conn.cursor(scrollable=True). The cursor opens with SQ_SCROLL (24) before SQ_OPEN (6), the result set stays materialized server-side, and each scroll method sends SQ_SFETCH (23) to fetch one row at a time. Use this for huge result sets where in-memory materialization would be wasteful.

    The user-facing API is identical to Phase 17's in-memory scroll (fetch_first, fetch_last, fetch_prior, fetch_absolute, fetch_relative, scroll, rownumber); only the internal mechanism differs:

    Default cursor scrollable=True
    Memory All rows materialized One row at a time
    Network round-trips per fetch 0 (after initial NFETCH) 1 (one SFETCH per call)
    Cursor lifetime Closed after execute() Open until close()
    Best for Moderate result sets, sequential iteration Huge result sets, random access

    Implementation discovers total row count lazily via SFETCH(LAST=4) when negative absolute indexing requires it; result is cached in _scroll_total_rows. Position tracking is authoritative from the server's SQ_TUPID (25) tag, not client-computed.

Wire-protocol details

  • SQ_SFETCH (23): [short SQ_ID=4][int 23][short scrolltype][int target][int bufSize=4096][short SQ_EOT]. scrolltype values: 1=NEXT, 4=LAST, 6=ABSOLUTE.
  • SQ_SCROLL (24): emitted between CURNAME and SQ_OPEN to mark the cursor as scrollable.
  • SQ_TUPID (25): server response carrying the 1-indexed row position the server just delivered. [short 25][int rowID].

The trap on the way: I initially used SHORT for bufSize and the server hung silently — same SHORT-vs-INT diagnostic pattern as Phase 4.x's CURNAME+NFETCH. Captured a JDBC trace, byte-diffed against ours, found the mismatch.

Tests

14 new integration tests in test_scroll_cursor_server.py. Total: 69 unit + 191 integration = 260 tests.

2026.05.04.1 — Scroll cursors

Added

  • Scroll cursor API on Cursor (Phase 17):

    • cur.scroll(value, mode='relative'|'absolute') — PEP 249 compatible
    • cur.fetch_first() / cur.fetch_last() — jump to ends
    • cur.fetch_prior() — backward step (SQL-standard semantics: from past-end yields the last row)
    • cur.fetch_absolute(n) — 0-indexed jump; negative n indexes from the end
    • cur.fetch_relative(n) — n-step from current position
    • cur.rownumber — current 0-indexed position (None if before-first or no result set)

    In-memory implementation — no new wire-protocol; the existing materialized result set in cur._rows is now indexed rather than iterated. For server-side scroll over huge result sets, SQ_SFETCH (tag 23) would be needed — Phase 18 if anyone hits the in-memory ceiling.

Tests

14 new integration tests in test_scroll_cursor.py. Total: 69 unit + 177 integration = 246 tests.

2026.05.04 — Library completion

The Phase 0 ambition — first pure-Python Informix SQLI driver — reaches feature completeness. Adds async, TLS, connection pool, smart-LOBs, fast-path RPC, composite UDTs.

Added

  • Async API (informix_db.aio) — AsyncConnection, AsyncCursor, AsyncConnectionPool for FastAPI / aiohttp / asyncio. Each blocking I/O call is offloaded to a worker thread via asyncio.to_thread; event loop never blocks.
  • Connection pool (informix_db.create_pool) — thread-safe with min/max sizing, lazy growth, health-check on acquire, error-aware eviction.
  • TLStls=True for self-signed dev servers, tls=ssl.SSLContext for production. Wrapping happens in IfxSocket so the rest of the protocol layer is unaware.
  • Smart-LOBs (BLOB / CLOB) — full read/write end-to-end via cursor.read_blob_column() / cursor.write_blob_column() using the server's lotofile / filetoblob SQL functions intercepted at the SQ_FILE (98) protocol level.
  • Legacy in-row blobs (BYTE / TEXT) — bind + read via the SQ_BBIND / SQ_BLOB / SQ_FETCHBLOB protocol family.
  • Fast-path RPC (Connection.fast_path_call) — direct stored-procedure invocation bypassing PREPARE/EXECUTE; routine handles cached per-connection.
  • Composite UDT recognitionROW, SET, MULTISET, LIST columns return typed RowValue / CollectionValue wrappers exposing schema and raw bytes.
  • Type codecsINTERVAL (both DAY-TO-FRACTION and YEAR-TO-MONTH families), DATETIME (all qualifier ranges), DECIMAL / MONEY (BCD with sign+exp head byte and asymmetric base-100 complement for negatives), DATE, BOOL, all integer / float widths, CHAR / VARCHAR / LVARCHAR.
  • Transactions — implicit SQ_BEGIN before each transaction in non-ANSI logged DBs; transparent no-ops on unlogged DBs.
  • PEP 249 exception hierarchy — server SQLCODE mapped to the right exception class (IntegrityError for duplicate-key violations, ProgrammingError for syntax errors, etc.).

Documentation

Test coverage

232 tests total: 69 unit + 163 integration. Unit tests run with no external dependencies; integration tests run against the IBM Informix Developer Edition Docker image.

Known gaps (deferred)

  • Full ROW/COLLECTION recursive parsing: Phase 12 ships type recognition + raw-bytes wrapper. Parsing the textual representation into typed Python tuples/sets/lists is deferred — most workloads can use SQL projections (SELECT row_col.fieldname FROM tbl) instead.
  • UDT parameter encoding for fast-path: scalar params/returns work; passing a 72-byte BLOB locator as a UDT param requires extending the SQ_BIND encoder with the extended_owner/extended_name preamble for type > 18.
  • Native async I/O: Phase 16 ships a thread-pool wrapper that's functionally equivalent for typical FastAPI workloads. Native async (asyncpg-style transport abstraction) would be Phase 17 if a real workload needs it.

2026.05.02 — Phase 1: connection lifecycle

Initial release. connect() / close() works end-to-end. Cursor / execute / fetch arrived in Phase 2 (subsequent commits within the same session).