Investigation of the Phase 21 baseline finding that executemany(N) cost scaled linearly per-row (1.74 ms x N) regardless of batch size. Root cause: every autocommit=True INSERT forces a server-side transaction-log flush. Not a wire-protocol bug. Numbers: * executemany(1000) autocommit=True: 1.72 s (1.72 ms/row) * executemany(1000) in single txn: 32 ms (32 us/row) 53x speedup from changing the transaction boundary, not the driver. Pure protocol overhead is ~32 us/row -> ~31K rows/sec sustained throughput on a single connection. Comparable to pg8000. Added test_executemany_1000_rows_in_txn benchmark to make this visible. Updated README headline numbers and added a "Performance gotchas" section explaining when autocommit=False matters. Decision: don't pipeline. The remaining 32 us is already excellent; the autocommit gotcha is the real user-facing footgun. Docs > code. If someone reports needing >31K rows/sec single-connection, that becomes Phase 22.
16 KiB
Changelog
All notable changes to informix-db. Versioning is CalVer — YYYY.MM.DD for date-based releases, YYYY.MM.DD.N for same-day post-releases per PEP 440.
2026.05.04.6 — executemany perf finding: it was the autocommit cliff
Investigation of the Phase 21 finding that executemany(N) cost scaled linearly per-row (1.74 ms × N) regardless of batch size. Root cause: every autocommit-True INSERT forces a server-side transaction-log flush. Not a wire-protocol bug.
Added
test_executemany_1000_rows_in_txnbenchmark — same workload, but inside a single transaction with one COMMIT at the end. Isolates pure protocol cost from server-storage cost.- New module-scoped
txn_connfixture intests/benchmarks/test_insert_perf.pyfor autocommit-False benchmarks.
Findings
| Mode | Total | Per row |
|---|---|---|
executemany(1000) autocommit=True |
1.72 s | 1.72 ms |
executemany(1000) in single txn |
32 ms | 32 µs |
53× speedup from changing the transaction boundary, not the driver. Pure protocol overhead is ~32 µs/row → ~31,000 rows/sec sustained throughput on a single connection. Comparable to mature pure-Python drivers (pg8000).
Changed
tests/benchmarks/README.md— updated headline numbers to show both modes, added a "Performance gotchas" section explaining when to useautocommit=Falsefor bulk loads.tests/benchmarks/baseline.json— refreshed to include the new txn-mode measurement (now 29 entries, was 28).
Decision: don't pipeline
Pipelining BIND+EXECUTE PDUs (writing N without waiting for responses between them) could potentially halve the 32 µs/row figure on loopback. Decided against:
- The remaining 32 µs is already excellent — single-connection bulk-load performance is not where users hit limits.
- Pipelining adds complexity around TCP send-buffer management, partial-failure semantics, and error reporting (which row failed when 50 are in flight).
- The autocommit gotcha is the real user-facing footgun. Better docs > more code.
If someone reports needing >31K rows/sec single-connection, this becomes Phase 22 work.
2026.05.04.5 — Performance benchmarks (Phase 21)
Adds tests/benchmarks/ — a pytest-benchmark driven suite covering codec micro-benchmarks (no server required) and end-to-end SELECT/INSERT/pool/async benchmarks. Establishes a committed baseline.json so future PRs can be compared against the floor and regressions caught at review.
Added
tests/benchmarks/test_codec_perf.py— 16 micro-benchmarks for the hot codec paths (decode,encode_param,parse_tuple_payload). Run without an Informix container; suitable for pre-merge CI.tests/benchmarks/test_select_perf.py— 4 SELECT round-trip benchmarks: 1-row latency floor, ~10 rows, full 1k-row table, parameterized.tests/benchmarks/test_insert_perf.py— 3 INSERT benchmarks: single-row,executemany(100),executemany(1000).tests/benchmarks/test_pool_perf.py— 3 pool benchmarks: cold connect (login handshake cost), pool acquire/release, pool acquire + tiny query + release.tests/benchmarks/test_async_perf.py— 2 async benchmarks: single async round-trip overhead, 10 concurrent SELECTs through an async pool.tests/benchmarks/conftest.py—bench_conn(long-lived autocommit connection) andbench_table(pre-populated 1k-row table) fixtures, both session-scoped.tests/benchmarks/baseline.json— committed baseline (28 measurements) for--benchmark-compareregression checks.tests/benchmarks/README.md— headline numbers, regression policy, how to update baseline, what each benchmark measures.make bench/make bench-codec/make bench-saveMakefile targets.benchmarkpytest marker — gated, off by default.pytest -m benchmarkto opt in.
Changed
make test-integrationnow uses-m "integration and not benchmark"so the integration suite stays fast (~6s) — benchmarks (~27s) are gated behindmake bench.pytestdefault-mnow excludes bothintegrationandbenchmark. Default run is unit-only.
Headline numbers (dev container, x86_64 Linux, loopback)
| Operation | Mean |
|---|---|
decode(int) (per cell) |
181 ns |
parse_tuple_payload(5 cols) (per row) |
2.87 µs |
SELECT 1 round-trip |
177 µs |
| Pool acquire + tiny query + release | 295 µs |
| Cold connect + close | 11.2 ms |
Pool-vs-cold delta is 72×. UTF-8 decode carries no measurable cost over iso-8859-1 (Phase 20 didn't slow anything down).
Tests
28 new benchmark tests. Total: 69 unit + 211 integration + 28 benchmark = 308.
2026.05.04.4 — UTF-8 / multibyte locale support
Threads the connection's CLIENT_LOCALE through to user-data string codecs so multibyte locales (UTF-8, etc.) round-trip correctly. The driver previously hardcoded iso-8859-1 for every string conversion — fine for Western European text, broken-by-design for CJK, Cyrillic, Arabic, emoji.
Added
-
Connection.encodingproperty — reports the Python codec name derived fromCLIENT_LOCALE(e.g.,iso-8859-1,utf-8,iso-8859-15). Default for a connection withoutclient_locale=isiso-8859-1(compatible with the legacy default). -
informix_db.connections._python_encoding_from_locale(locale: str)— maps Informix locale strings (en_US.utf8,en_US.8859-1,en_US.819) to Python codec names. Falls back toiso-8859-1for unknown / unsuffixed forms.
Changed
-
encode_param(value, encoding=...)and_encode_str(value, encoding=...)honor the connection's encoding instead of hardcodediso-8859-1. Cursor's_emit_bind_paramsforwardsself._conn.encodingper parameter. -
decode(type_code, raw, encoding=...)andparse_tuple_payload(reader, columns, encoding=...)thread the encoding to string column decoders (CHAR, VARCHAR, NCHAR, NVCHAR, LVARCHAR). Cursor's_read_fetch_responseforwardsself._conn.encoding. -
Smart-LOB CLOB encode/decode (
write_blob_column, simple-LOB TEXT fetch) honorself._conn.encoding. -
Fast-path RPC (
Connection.fast_path_call) honorsself._encodingfor its bound parameters.
Boundary discipline
Protocol-level strings stay iso-8859-1 (always ASCII, never user-controlled): cursor names, function signatures, server-fabricated SQ_FILE virtual filenames, error "near tokens", SQL keywords/identifiers. Only user-data strings (column values, parameter binds) follow CLIENT_LOCALE.
Error handling
Encoding-can't-represent-this-value (e.g., "你好" on an 8859-1 connection) now raises informix_db.DataError instead of letting Python's UnicodeEncodeError leak. The cursor releases the prepared statement before propagating, so the connection survives cleanly for the next query.
Tests
9 new integration tests in tests/test_unicode.py:
- ASCII round-trip (regression)
- Latin-1 high-bit chars round-trip on default locale
- Full byte range 0x20-0xFE round-trip via VARCHAR
- Locale → Python codec mapping for common forms
Connection.encodingexposes the resolved codec- UTF-8 locale negotiation (server transcodes for ASCII even with 8859-1 DB)
- UTF-8 multibyte round-trip (skipped without
IFX_UTF8_DATABASEenv var pointing to a UTF-8 database) - Non-representable char raises
DataErrorcleanly; connection survives - CLOB column round-trips Latin-1 text honoring connection encoding
Total: 69 unit + 212 integration = 281 tests.
Limitations
- Multibyte UTF-8 storage requires both
client_locale='en_US.utf8'AND a database whoseDB_LOCALEis UTF-8. The dev container'stestdbis8859-1, so storing CJK chars there will continue to fail server-side regardless of the client codec. Thetest_utf8_multibyte_round_triptest is gated on theIFX_UTF8_DATABASEenv var pointing to a UTF-8 database.
2026.05.04.3 — Resilience tests (fault injection)
Added
-
tests/_proxy.py—ControlledProxyhelper: a thread-based TCP forwarder between the test client and Informix, with akill()method that sends TCP RST (viaSO_LINGER=0) to simulate a network drop or server crash. Used as a context manager. -
tests/test_resilience.py— 12 integration tests filling the resilience gap identified in the test-coverage audit:- Network drop mid-SELECT raises
OperationalErrorcleanly (not hang) - Network drop after describe but before fetch
- Network drop during fetch iteration (already-materialized rows still readable, fresh execute fails)
- Local socket close (yank-the-rug from client side)
- I/O error marks connection unusable
- Pool evicts a connection that died mid-
withblock - Pool revives after all idle connections died (health-check on acquire mints fresh)
- Async cancellation via
asyncio.wait_for— pool stays usable for subsequent queries - Cursor reusable after SQL error
- Connection survives cursor close after error
- Pool sustained-load smoke (50 acquire/release cycles, no leak)
read_timeoutfires on a hung connection
- Network drop mid-SELECT raises
What this catches
- Hangs (waiting forever on a dead socket)
- Silent data corruption (treating EOF as a valid tuple)
- Double-fault (one error → cleanup raises a different error)
- Pool poisoning (returning a broken connection to the pool)
- Stale cursor reuse (same cursor reused across an error boundary)
Tests
12 new integration tests. Total: 69 unit + 203 integration = 272 tests.
The Phase 19 work fills the highest-priority gap from the test-adequacy audit. Remaining gaps from that audit (UTF-8 locale, server-version matrix, performance benchmarks) are real but lower-severity.
2026.05.04.2 — Server-side scrollable cursors
Added
-
Server-side scrollable cursors (Phase 18): opt in via
conn.cursor(scrollable=True). The cursor opens withSQ_SCROLL(24) beforeSQ_OPEN(6), the result set stays materialized server-side, and each scroll method sendsSQ_SFETCH(23) to fetch one row at a time. Use this for huge result sets where in-memory materialization would be wasteful.The user-facing API is identical to Phase 17's in-memory scroll (
fetch_first,fetch_last,fetch_prior,fetch_absolute,fetch_relative,scroll,rownumber); only the internal mechanism differs:Default cursor scrollable=TrueMemory All rows materialized One row at a time Network round-trips per fetch 0 (after initial NFETCH) 1 (one SFETCH per call) Cursor lifetime Closed after execute()Open until close()Best for Moderate result sets, sequential iteration Huge result sets, random access Implementation discovers total row count lazily via SFETCH(LAST=4) when negative absolute indexing requires it; result is cached in
_scroll_total_rows. Position tracking is authoritative from the server'sSQ_TUPID(25) tag, not client-computed.
Wire-protocol details
SQ_SFETCH(23):[short SQ_ID=4][int 23][short scrolltype][int target][int bufSize=4096][short SQ_EOT]. scrolltype values: 1=NEXT, 4=LAST, 6=ABSOLUTE.SQ_SCROLL(24): emitted between CURNAME and SQ_OPEN to mark the cursor as scrollable.SQ_TUPID(25): server response carrying the 1-indexed row position the server just delivered.[short 25][int rowID].
The trap on the way: I initially used SHORT for bufSize and the server hung silently — same SHORT-vs-INT diagnostic pattern as Phase 4.x's CURNAME+NFETCH. Captured a JDBC trace, byte-diffed against ours, found the mismatch.
Tests
14 new integration tests in test_scroll_cursor_server.py. Total: 69 unit + 191 integration = 260 tests.
2026.05.04.1 — Scroll cursors
Added
-
Scroll cursor API on
Cursor(Phase 17):cur.scroll(value, mode='relative'|'absolute')— PEP 249 compatiblecur.fetch_first()/cur.fetch_last()— jump to endscur.fetch_prior()— backward step (SQL-standard semantics: from past-end yields the last row)cur.fetch_absolute(n)— 0-indexed jump; negativenindexes from the endcur.fetch_relative(n)— n-step from current positioncur.rownumber— current 0-indexed position (None if before-first or no result set)
In-memory implementation — no new wire-protocol; the existing materialized result set in
cur._rowsis now indexed rather than iterated. For server-side scroll over huge result sets,SQ_SFETCH(tag 23) would be needed — Phase 18 if anyone hits the in-memory ceiling.
Tests
14 new integration tests in test_scroll_cursor.py. Total: 69 unit + 177 integration = 246 tests.
2026.05.04 — Library completion
The Phase 0 ambition — first pure-Python Informix SQLI driver — reaches feature completeness. Adds async, TLS, connection pool, smart-LOBs, fast-path RPC, composite UDTs.
Added
- Async API (
informix_db.aio) —AsyncConnection,AsyncCursor,AsyncConnectionPoolfor FastAPI / aiohttp / asyncio. Each blocking I/O call is offloaded to a worker thread viaasyncio.to_thread; event loop never blocks. - Connection pool (
informix_db.create_pool) — thread-safe with min/max sizing, lazy growth, health-check on acquire, error-aware eviction. - TLS —
tls=Truefor self-signed dev servers,tls=ssl.SSLContextfor production. Wrapping happens inIfxSocketso the rest of the protocol layer is unaware. - Smart-LOBs (BLOB / CLOB) — full read/write end-to-end via
cursor.read_blob_column()/cursor.write_blob_column()using the server'slotofile/filetoblobSQL functions intercepted at theSQ_FILE(98) protocol level. - Legacy in-row blobs (BYTE / TEXT) — bind + read via the
SQ_BBIND/SQ_BLOB/SQ_FETCHBLOBprotocol family. - Fast-path RPC (
Connection.fast_path_call) — direct stored-procedure invocation bypassing PREPARE/EXECUTE; routine handles cached per-connection. - Composite UDT recognition —
ROW,SET,MULTISET,LISTcolumns return typedRowValue/CollectionValuewrappers exposing schema and raw bytes. - Type codecs —
INTERVAL(both DAY-TO-FRACTION and YEAR-TO-MONTH families),DATETIME(all qualifier ranges),DECIMAL/MONEY(BCD with sign+exp head byte and asymmetric base-100 complement for negatives),DATE,BOOL, all integer / float widths,CHAR/VARCHAR/LVARCHAR. - Transactions — implicit
SQ_BEGINbefore each transaction in non-ANSI logged DBs; transparent no-ops on unlogged DBs. - PEP 249 exception hierarchy — server
SQLCODEmapped to the right exception class (IntegrityErrorfor duplicate-key violations,ProgrammingErrorfor syntax errors, etc.).
Documentation
README.md— overview and quick-startdocs/USAGE.md— practical recipes and migration guidedocs/PROTOCOL_NOTES.md— byte-level wire-format referencedocs/DECISION_LOG.md— phase-by-phase architectural decisions, with the why preserveddocs/JDBC_NOTES.md— index into the decompiled IBM JDBC referencedocs/CAPTURES/— annotated socat hex-dump captures
Test coverage
232 tests total: 69 unit + 163 integration. Unit tests run with no external dependencies; integration tests run against the IBM Informix Developer Edition Docker image.
Known gaps (deferred)
- Full ROW/COLLECTION recursive parsing: Phase 12 ships type recognition + raw-bytes wrapper. Parsing the textual representation into typed Python tuples/sets/lists is deferred — most workloads can use SQL projections (
SELECT row_col.fieldname FROM tbl) instead. - UDT parameter encoding for fast-path: scalar params/returns work; passing a 72-byte BLOB locator as a UDT param requires extending the SQ_BIND encoder with the extended_owner/extended_name preamble for type > 18.
- Native async I/O: Phase 16 ships a thread-pool wrapper that's functionally equivalent for typical FastAPI workloads. Native async (asyncpg-style transport abstraction) would be Phase 17 if a real workload needs it.
2026.05.02 — Phase 1: connection lifecycle
Initial release. connect() / close() works end-to-end. Cursor / execute / fetch arrived in Phase 2 (subsequent commits within the same session).