Closes Hamilton audit Critical #2 (concurrency / wire lock) and
High #3 (async cancellation evicts cleanly). Phase 26 fixed what
gets returned to the pool; Phase 27 fixes what can interleave on
the wire while it's running.
What changed:
connections.py:
* Added Connection._wire_lock = threading.RLock(). Wrapped commit(),
rollback(), fast_path_call() under the lock.
* _ensure_transaction documents the lock as a precondition AND
asserts ownership at runtime (_wire_lock._is_owned()) so a future
caller adding a third call site fails loudly.
* close() tries to acquire wire lock with 0.5s timeout before
SQ_EXIT; skips polite exit and force-closes if busy.
cursors.py:
* execute() body extracted into _execute_under_wire_lock() and
called under the lock.
* executemany() body wrapped inline.
* _sfetch_at() wrapped - covers all scrollable fetch_* methods
that delegate to it.
* close() locks the CLOSE+RELEASE for scrollable cursors.
pool.py:
* release() acquires conn._wire_lock with 5s timeout before rollback.
On timeout: log WARNING, evict connection. Constant
_RELEASE_WIRE_LOCK_TIMEOUT for tunability.
aio.py:
* AsyncConnectionPool.connection() now catches CancelledError /
TimeoutError separately and routes to broken=True. Combined with
the wire lock, asyncio.wait_for around aio DB calls is now safe.
* Updated docstring; mirrored in docs/USAGE.md.
Margaret Hamilton review surfaced three actionable conditions, all
addressed before tagging:
* Cancellation test used contextlib.suppress - could pass without
exercising the cancellation path on a fast runner. Switched to
pytest.raises so the test fails if timeout doesn't fire.
* _ensure_transaction precondition documented but unchecked at
runtime. Added assert self._wire_lock._is_owned() guard.
* Connection.close() was unsynchronized. Now tries 0.5s acquire
before SQ_EXIT.
Two new regression tests in tests/test_pool.py:
* test_concurrent_threads_on_one_connection_dont_interleave_pdus
(without lock: garbled results / hangs)
* test_async_wait_for_cancellation_evicts_connection
(asserts pool size shrinks; cancellation actually fires)
72 unit + 228 integration + 28 benchmark = 328 tests; ruff clean.
Hamilton verdict: PRODUCTION READY WITH CAVEATS (was) -> CAVEATS
NARROWED FURTHER (now). 0 critical, 2 high remaining (cursor
finalizers + bare-except in error drain) - both Phase 28 scope.
Fixes the dirty-pool-checkout bug surfaced by Margaret Hamilton's
system-wide audit (Critical #1).
The bug: ConnectionPool.release() returned connections with open
server-side transactions still active. Request A's uncommitted
INSERTs would be inherited by Request B reusing the same connection -
B's commit would land A's writes permanently; B's rollback would
silently lose them. Same shape as psycopg2's pre-2.5 dirty-pool bug.
The fix: pool.release() now rolls back any open transaction before
returning the connection to the idle list. The rollback runs OUTSIDE
the pool lock since it's a wire round-trip - the connection is
already off the idle list and counted in _total, so no other thread
can grab it during the rollback window. If the rollback itself fails
(dead socket, etc.), the connection is evicted rather than recycled.
Async path covered automatically: AsyncConnectionPool.release()
delegates to the sync pool's release via _to_thread.
Margaret Hamilton review pass surfaced two findings, both addressed:
* Silent rollback failure: added a WARNING log via logging.getLogger
("informix_db.pool") so evictions are debuggable. First logger in
the project.
* Async cancellation race: the fix doesn't introduce the
asyncio.wait_for race (Critical #2, deferred to Phase 27), but it
adds a code path that can trigger it. Documented loudly in
pool.release() docstring, aio.py module docstring, and USAGE.md
async section. Recommendation: use read_timeout on the connection
instead of asyncio.wait_for until Phase 27 lands.
Two new regression tests in tests/test_pool.py:
* test_uncommitted_writes_invisible_to_next_acquirer (the bug)
* test_committed_writes_survive_pool_checkout (no over-correction)
Verified the regression test catches the bug: stashed the fix, ran
the test - it fails with "B sees 1 rows - leaked across pool
checkout boundary" - confirming it tests the real failure mode.
Total tests: 72 unit + 226 integration + 28 benchmark = 326.
Deferred to Phase 27 per Hamilton audit:
* Critical #2 (concurrency / per-connection wire lock)
* High #3 (async cancellation routes to broken=True)
* High #4 (bare except in _raise_sq_err drain)
* High #5 (no cursor finalizers - server-side resource leaks)
Third-pass optimization on parse_tuple_payload's hot loop. Previous
phases removed redundant work; this one removes correct-but-wasteful
work: the if/elif chain checked branches in implementation order, not
frequency order. Fixed-width types (INT, FLOAT, DATE, BIGINT - the most
common columns in real queries) sat at the bottom, paying ~7 frozenset
misses per column.
Changes (src/informix_db/_resultset.py):
* Added _FIXED_WIDTH_TYPES = frozenset(FIXED_WIDTHS.keys()) at module
load.
* New fast-path branch at the TOP of parse_tuple_payload's loop body
that handles every _FIXED_WIDTH_TYPES column inline: one frozenset
check, one dict lookup, one decode, continue. Skips every other
branch.
* Cleaned up the bottom fall-through; it now genuinely only catches
unknown types.
Performance vs Phase 24 baseline:
* parse_tuple_5cols_iso8859: 1659 ns -> 1400 ns (-16%)
* parse_tuple_5cols_utf8: 1649 ns -> 1341 ns (-19%)
Cumulative vs Phase 21 baseline (before any optimization):
* parse_tuple_5cols: 2796 ns -> 1400 ns (-50%) - HALF the time
* decode_int: 230 ns -> 139 ns (-40%)
Margaret Hamilton review surfaced one HIGH finding addressed before
tagging:
* H: The fast-path optimization assumes every FIXED_WIDTHS key is
decodable WITHOUT qualifier inspection (encoded_length etc.). True
today, but a future contributor adding a fixed-width type that
needs qualifier bits (like DATETIME does) would silently get wrong
decode behavior - Lauren-Bug class failure.
Fix: added INVARIANT comment to FIXED_WIDTHS in converters.py AND
added tests/test_resultset_invariants.py with three CI tripwire
tests:
- _FIXED_WIDTH_TYPES is disjoint from every other dispatch branch
- Every FIXED_WIDTHS key has a DECODERS entry
- DECODERS keys stay < 0x100 (Phase 24 collision-free guarantee)
The tests carry instructions: if one fires, don't update the test
to match - either restore the property or refactor the optimization.
Comments rot when nobody reads them; tests fail loudly.
baseline.json refreshed; 72 unit + 224 integration + 28 bench = 324
tests; ruff clean.
Second pass of hot-path optimization on parse_tuple_payload. Two changes
to converters.py:
1. Split decode() into public + internal. Added _decode_base(base_tc,
raw, encoding) that takes an already-base-typed code and skips the
redundant base_type() call. Public decode() is now a one-line
wrapper. parse_tuple_payload's 4 call sites swapped to use
_decode_base directly. _fastpath.py's external decode() caller is
unaffected.
2. Pre-compiled struct.Struct unpackers. The fixed-width integer/float
decoders (_decode_smallint, _decode_int, _decode_bigint,
_decode_smfloat, _decode_float, _decode_date) switched from per-call
struct.unpack(fmt, raw) to module-level bound methods like
_UNPACK_INT = struct.Struct("!i").unpack. Format-string parsed once
at module load. Measured 37% faster than per-call struct.unpack on
CPython 3.13 micro.
Performance vs Phase 23 baseline:
* decode_int: 173 ns -> 139 ns (-20%)
* decode_bigint: 188 ns -> 150 ns (-20%)
* parse_tuple_5cols: 2047 ns -> 1592 ns (-22%)
* 1k-row SELECT: 1255 us -> 989 us (-21%)
Cumulative vs original Phase 21 baseline:
* decode_int: 230 ns -> 139 ns (-40%)
* parse_tuple_5cols: 2796 ns -> 1592 ns (-43%)
* 1k-row SELECT: 1477 us -> 989 us (-33%)
Real-world fetch ceiling: 358K rows/sec -> ~620K rows/sec.
Margaret Hamilton review surfaced one HIGH-severity finding addressed
before tagging:
* H: The no-collision guarantee that makes _decode_base safe is
structural but undocumented (all DECODERS keys are ≤ 0xFF, all flag
bits are ≥ 0x100, so flagged inputs cannot coincidentally match).
Added load-bearing INVARIANT comment at DECODERS dict explaining
the constraint and what to do if violated. Cross-referenced from
_decode_base's docstring for bidirectional traceability.
baseline.json refreshed; all 224 integration tests pass; ruff clean.
The docs/USAGE.md predated Phases 17-21, so anyone landing on PyPI was
missing scrollable cursors, locale/Unicode, the autocommit cliff
finding, and the type-mapping reference.
Added sections to docs/USAGE.md:
* Locale and Unicode - client_locale, Connection.encoding, CLIENT_LOCALE
vs DB_LOCALE, when characters can't fit the codec
* Type mapping reference - full SQL <-> Python type table, NULL
sentinels subsection, IntervalYM
* Performance tips - 53x autocommit-cliff fix, 100x executemany win,
72x pool win, with the actual benchmark numbers from Phase 21.1
* Scrollable cursors - fetch_* API, in-memory vs server-side trade-off,
edge cases (past-end semantics, negative indexing, rownumber)
* Timeouts and keepalive subsection - production starting points
* Environment dictionary subsection - env={} parameter
* Known limitations - explicit table of what doesn't work (named
params, complex UDT bind, GSSAPI, XA) with workarounds; "things
that might surprise you" notes
README.md - added Documentation section linking to docs/USAGE.md
and tests/benchmarks/README.md.
Doc corrections caught during review:
* cursor.rownumber is 0-indexed (impl has always been correct; only
the original docstring wording was loose)
* fetch_* methods work on BOTH scrollable=True and default cursors;
the in-memory path supports them too
USAGE.md grew from 345 lines to 633.
Investigation of the Phase 21 baseline finding that executemany(N) cost
scaled linearly per-row (1.74 ms x N) regardless of batch size.
Root cause: every autocommit=True INSERT forces a server-side
transaction-log flush. Not a wire-protocol bug.
Numbers:
* executemany(1000) autocommit=True: 1.72 s (1.72 ms/row)
* executemany(1000) in single txn: 32 ms (32 us/row)
53x speedup from changing the transaction boundary, not the driver.
Pure protocol overhead is ~32 us/row -> ~31K rows/sec sustained
throughput on a single connection. Comparable to pg8000.
Added test_executemany_1000_rows_in_txn benchmark to make this
visible. Updated README headline numbers and added a "Performance
gotchas" section explaining when autocommit=False matters.
Decision: don't pipeline. The remaining 32 us is already excellent;
the autocommit gotcha is the real user-facing footgun. Docs > code.
If someone reports needing >31K rows/sec single-connection, that
becomes Phase 22.
Fills the highest-priority gap from the test-adequacy audit:
connection-failure recovery. 12 new integration tests using a
thread-based TCP proxy (ControlledProxy) that can be kill()'d at
any moment to simulate network drops or server crashes via TCP RST
(SO_LINGER=0).
Coverage:
* Network drop mid-SELECT — OperationalError, not hang
* Network drop after describe, before fetch
* Network drop during fetch (already-materialized rows still
readable; fresh execute fails)
* Local socket forced-close (kernel-level disconnect simulation)
* I/O error marks connection unusable post-failure
* Pool evicts connection that died mid-`with` block (size drops)
* Pool revives after all idle connections died (health check on
acquire mints fresh)
* Async cancellation via asyncio.wait_for — pool stays usable
* Cursor reusable after SQL error
* Connection survives cursor close after error
* Sustained pool load (50 acquire/release cycles, no leak)
* read_timeout fires on a hung connection within bounds
Catches the failure classes that bite production users:
* Hangs (waiting forever on dead socket)
* Silent corruption (EOF treated as valid tuple)
* Double-fault (cleanup raises after primary error)
* Pool poisoning (broken connection returned to pool)
* Stale cursor reuse across error boundaries
Helper:
* tests/_proxy.py — ControlledProxy: thread-based TCP forwarder
with kill() for fault injection. Two-thread pump model. SO_LINGER=0
for RST-on-close (mimics router drop).
Total: 69 unit + 203 integration = 272 tests.
Remaining gaps from the audit (UTF-8 multibyte locale, server-version
matrix, performance benchmarks) are real but lower-severity. Phase 19
addressed the one most likely to bite production deployments.
Opt-in via conn.cursor(scrollable=True). Opens the cursor with
SQ_SCROLL (24) before SQ_OPEN (6), keeps it open server-side, and
sends SQ_SFETCH (23) per scroll call instead of materializing the
result set up-front.
User-facing API is identical to Phase 17's in-memory scroll
(fetch_first/last/prior/absolute/relative, scroll, rownumber).
Only the internal mechanism differs:
| feature | default | scrollable=True
|-------------------|------------------|------------------
| memory | all rows | one row at a time
| round-trips/fetch | 0 (after NFETCH) | 1 per call
| cursor lifetime | closed after exec| open until close()
| best for | sequential iter | random access on
| huge result sets
Wire format (verified against JDBC ScrollProbe capture):
* SQ_SFETCH: [short SQ_ID=4][int 23][short scrolltype]
[int target][int bufSize=4096][short SQ_EOT]
scrolltype: 1=NEXT, 4=LAST, 6=ABSOLUTE
* SQ_SCROLL (24): emitted between CURNAME and SQ_OPEN
* SQ_TUPID (25): response tag with 1-indexed row position;
authoritative source for client-side position tracking
Position tracking uses the server's SQ_TUPID rather than client-
computed indexes. Total row count discovered lazily via SFETCH(LAST)
when negative absolute indexing requires it; cached in
_scroll_total_rows.
Trap on the way: initial SFETCH used SHORT for bufSize → server
hung silently. Same SHORT-vs-INT diagnostic pattern as Phase 4.x's
CURNAME+NFETCH. Captured JDBC trace, byte-diffed against ours,
found the mismatch (bufSize is INT in modern Informix per
isXPSVER8_40 / is2GBFetchBufferSupported).
Tests: 14 integration tests in test_scroll_cursor_server.py
covering lifecycle, sequential fetch, fetch_first/last/prior/
absolute/relative, negative indexing, scroll, empty result sets,
past-end, and random-access on a 100-row result set.
Total: 69 unit + 191 integration = 260 tests.
Version bump (2026.05.02 → 2026.05.04) reflects the library reaching
feature completeness across Phases 1-16.
Documentation:
* README.md — full rewrite. The previous README was from Phase 1
("cursor() / execute() / fetchone() arrive in Phase 2"). New
README covers: sync + async APIs, connection pool, TLS, full type
matrix, smart-LOBs, fast-path RPC, server-compatibility,
development workflow, and pointers to the protocol research docs.
* docs/USAGE.md — new practical recipe guide. Connecting, cursor
lifecycle, parameter binding, transactions (logged + unlogged),
executemany, smart-LOB read/write, connection pool, async,
TLS, error handling, fast-path RPC, server-side setup steps,
and a migration table from IfxPy / legacy informixdb.
* CHANGELOG.md — new file. Captures the v2026.05.04 release as the
Phase 1-16 completion milestone with a full feature inventory
and known-gap list. Future point-releases append here.
Classifiers updated:
* Development Status: 2 → 4 (Pre-Alpha → Beta)
* Added Framework :: AsyncIO
Keywords: added asyncio, async.
No code changes; tests still pass (69 unit + 163 integration = 232).
Ruff clean.
Ships AsyncConnection, AsyncCursor, and AsyncConnectionPool that
expose async/await versions of the sync API for use with FastAPI,
aiohttp, etc.
Strategy: thread-pool wrapping (aiopg pattern), not native async.
Each blocking I/O call is offloaded to a worker thread via
asyncio.to_thread. The event loop never blocks; queries run in
parallel up to the pool's max_size. Cost: ~250 lines, no changes
to the sync codebase. Native async (Phase 17) would require a
~2000-line transport abstraction refactor — deferred until a real
workload needs it.
For typical FastAPI/aiohttp workloads (request → one query → return),
this is functionally equivalent to native async. Each await yields
the loop while a worker thread does the I/O. Only differs for
hundreds-of-concurrent-connections workloads.
API mirrors the sync API one-to-one:
import asyncio
from informix_db import aio
async def main():
pool = await aio.create_pool(host=..., min_size=1, max_size=10)
async with pool.connection() as conn:
cur = await conn.cursor()
await cur.execute("SELECT id FROM users WHERE name = ?", (name,))
row = await cur.fetchone()
await pool.close()
The async pool preserves the sync pool's eviction policy: connection
errors evict, application errors retain.
Tests: 9 integration tests in test_aio.py covering open/close,
async-with, simple/parameterized SELECT, async-for cursor iteration,
pool acquire/release, 20-query concurrent gather (verifies parallelism
through max_size=5 pool), pool async context manager, commit/rollback.
Total: 69 unit + 163 integration = 232 tests.
Pyproject changes:
* Added pytest-asyncio>=1.3.0 as dev dep
* asyncio_mode = "auto" so async tests don't need decorators
Architectural completion: with Phase 16, every backlog item is
done. The Phase 0 ambition — first pure-Python Informix driver,
no native deps — is now genuinely complete.
This commit takes informix-db from documentation-only (Phase 0 spike)
to a functional connect() / close() against a real Informix server.
To our knowledge, this is the first pure-socket Informix client in any
language — no CSDK, no JVM, no native libraries.
Layered architecture per the plan, mirroring PyMySQL's shape:
src/informix_db/
__init__.py — PEP 249 surface (connect, exceptions, paramstyle="numeric")
exceptions.py — full PEP 249 hierarchy declared up front
_socket.py — raw socket I/O (read_exact, write_all, timeouts)
_protocol.py — IfxStreamReader / IfxStreamWriter framing primitives
(big-endian, 16-bit-aligned variable payloads,
length-prefixed nul-terminated strings)
_messages.py — SQ_* tags from IfxMessageTypes + ASF/login markers
_auth.py — pluggable auth handlers; plain-password is the
only Phase-1 implementation
connections.py — Connection class: builds the binary login PDU
(SLheader + PFheader byte-for-byte per
PROTOCOL_NOTES.md §3), sends it, parses the
server response, wires up close()
Phase 1 design decisions locked in DECISION_LOG.md:
- paramstyle = "numeric" (matches Informix ESQL/C convention)
- Python >= 3.10
- autocommit defaults to off (PEP 249 implicit)
- License: MIT
- Distribution name: informix-db (verified PyPI-available)
Test coverage: 34 unit tests (codec round-trips against synthetic byte
streams; observed login-PDU values from the spike captures asserted as
exact byte literals) + 6 integration tests (connect, idempotent close,
context manager, bad-password → OperationalError, bad-host →
OperationalError, cursor() raises NotImplementedError).
pytest — runs 34 unit tests, no Docker needed
pytest -m integration — runs 6 integration tests against the
Developer Edition container (pinned by digest
in tests/docker-compose.yml)
pytest -m "" — runs everything
ruff is clean across src/ and tests/.
One bug found during smoke testing: threading.get_ident() can exceed
signed 32-bit on some processes, overflowing struct.pack("!i"). Fixed
the same way the JDBC reference does — clamp to signed 32-bit, fall
back to 0 if out of range. The field is diagnostic only.
One protocol-level observation that AMENDED the JDBC source reading:
the "capability section" in the login PDU is three independently
negotiated 4-byte ints (Cap_1=1, Cap_2=0x3c000000, Cap_3=0), not one
int + 8 reserved zero bytes as my CFR decompile read suggested. The
server echoes them back identically. Trust the wire over the
decompiler.
Phase 1 verification matrix (from PROTOCOL_NOTES.md §12):
- Login byte layout: confirmed (server accepts our pure-Python PDU)
- Disconnection: confirmed (SQ_EXIT round-trip works)
- Framing primitives: confirmed (34 unit tests)
- Error path: bad password → OperationalError, bad host → OperationalError
Phase 2 (Cursor / SELECT / basic types) is the next phase. The hard
unknowns there — exact column-descriptor layout, statement-time error
format — were called out as bounded gaps in Phase 0 and have existing
captures (02-select-1.socat.log, 02-dml-cycle.socat.log) to characterize
against.