informix-db

Author	SHA1	Message	Date
Ryan Malloy	5825d5c55e	Extend scaling benches: 100-column case + 100k memory profile + 1M gating Adds three things to test_scaling_perf.py: 1. 100-column wide-row SELECT - codec stress test at extreme widths. 1k rows x 100 cols = 19.4 ms (~194 us/row, ~1.94 us/column-decode). Per-column cost continues to drop with width thanks to loop amortization (5 cols: 480 ns/col -> 100 cols: 194 ns/col). 2. 100k-row memory profile - samples RSS pre-execute, post-execute (materialization cost), and during iteration. Real numbers: pre-execute: 45.8 MB post-execute: 71.2 MB (+25.4 MB = ~259 bytes/row materialization) iteration: 0 KB extra (just walks the existing list) Documents the in-memory cursor's actual cost: 100k rows = 25 MB, 1M rows = ~250 MB. Fair regression baseline (tripped at 500 MB). 3. 1M-row scaling gated behind IFX_BENCH_1M=1 env var. Default off because the dev container's rootdbs runs out of space. For production-sized servers users can opt in. The implementation is linear-extrapolation-correct (executemany 100k -> 1M = ~15s, SELECT 100k -> 1M = ~3s). Note on the dev-container size limit: dev image's rootdbs is sized for typical developer workloads, not stress testing. A 1M-row INSERT exceeds the available pages and fails with -242 ISAM -113 (out of space). This is correct behavior - the limit is enforced at the storage layer. Switched RSS sampling from ru_maxrss (peak, monotonic) to /proc/self/status VmRSS (current). Earlier runs showed flat RSS because peak from earlier in the test session masked the fluctuation.	2026-05-05 13:10:32 -06:00
Ryan Malloy	270155d2de	Phase 36: IfxPy scaling comparison + honest comparison numbers (2026.05.05.9) Extends the IfxPy comparison bench script with scaling workloads (1k/10k/100k rows for both executemany and SELECT). Re-runs the full comparison with consistent measurement methodology and updates the README with the actually-correct numbers. Earlier comparison runs reported informix-db winning all 5 benchmarks. Re-running select_bench_table_all with consistent measurement gives 3.04 ms, not the 891 us I cited earlier - a 3.4x discrepancy attributable to noisy warmup + small-fixture artifacts. The "we win everything" framing was wrong. Corrected comparison reveals two clear stories: Bulk-insert: pure-Python wins 1.6x at scale. executemany(10k): IfxPy 259ms -> us 161ms (1.6x faster) executemany(100k): IfxPy 2376ms -> us 1487ms (1.6x faster) Reason: Phase 33's pipelining eliminates per-row RTT. IfxPy's per-call API can't pipeline. Large-fetch: IfxPy wins 2.3-2.4x at scale. SELECT 1k rows: IfxPy 1.2ms / us 2.7ms (IfxPy 2.3x) SELECT 10k rows: IfxPy 11.3ms / us 25.8ms (IfxPy 2.3x) SELECT 100k rows: IfxPy 112ms / us 271ms (IfxPy 2.4x) Reason: C-level fetch_tuple at ~1.1us/row beats Python parse_tuple_payload at ~2.7us/row. Real C-vs-Python codec gap showing up at scale. For everyday workloads (single SELECT in a request, INSERT a handful of rows), drivers are within 5-25%. For workloads where the gap widens, direction depends on what you're doing - bulk- write favors us, bulk-read favors IfxPy. README's "Compared to IfxPy" section rewritten with the corrected numbers and an honest "when to prefer which" subsection. tests/benchmarks/compare/README.md mirror updated. Net narrative: a "faster at bulk-write, slower at bulk-read, comparable elsewhere" comparison story is more honest and more durable than a "we win everything" claim that would have collapsed the first time a user ran their own benchmark. Side note (lint): one ambiguous unicode `×` in cursors.py replaced with `x`. Phase 37 ticket: parse_tuple_payload is the bottleneck at scale. Closing the 1.6 us/row gap to IfxPy would make us competitive on bulk-fetch too. Possible approaches: Cython codec, deeper inlining, per-column dispatch pre-bake.	2026-05-05 12:44:52 -06:00
Ryan Malloy	1282893412	Phase 35: CRITICAL fix - NFETCH loop for large result sets (2026.05.05.7) DATA-LOSS BUG: cursor.fetchall() on result sets larger than ~200 rows was silently truncating to the first ~200 rows. The exact cap depended on row width and the server's per-NFETCH buffer (4096 bytes default). The bug: _execute_select sent NFETCH twice and stopped: self._conn._send_pdu(self._build_curname_nfetch_pdu(cursor_name)) self._read_fetch_response() self._conn._send_pdu(self._build_nfetch_pdu()) # comment: "DONE only" self._read_fetch_response() # then CLOSE+RELEASE — discarding remaining queued rows The "second fetch returns DONE only" comment was wrong. For any result set larger than the server's per-NFETCH batch, the second fetch returns more tuples AND there are still tuples queued server-side. The cursor closed and dropped them. Latent for 30 phases because every existing test used either a small result set (FIRST 10) or relied on row counts that fit naturally in 1-2 batches. Discovered by Phase 34's scaling benchmark when SELECT FIRST 100000 from a 100k-row table returned 200 rows. The fix: loop NFETCH until a response yields zero new tuples. self._conn._send_pdu(self._build_curname_nfetch_pdu(cursor_name)) rows_before = len(self._rows) self._read_fetch_response() rows_received = len(self._rows) - rows_before while rows_received > 0: self._conn._send_pdu(self._build_nfetch_pdu()) rows_before = len(self._rows) self._read_fetch_response() rows_received = len(self._rows) - rows_before 249 integration tests pass. The scaling benchmark suite (Phase 34, shipping next) is the regression test going forward. Workaround for users on older versions: use scrollable cursors (cursor(scrollable=True)) which use the SQ_SFETCH protocol path and don't have this bug. If you've been using this driver for queries returning large result sets, your queries may have been truncating silently. Re-run them against 2026.05.05.7+ to verify your data.	2026-05-05 12:37:22 -06:00
Ryan Malloy	a9e1f17bae	Phase 31: Head-to-head benchmark vs IfxPy (the C-bound PyPI driver) Adds a paired benchmark of informix-db (pure Python) against IfxPy 3.0.5 (IBM's C-bound driver via OneDB ODBC) on identical workloads against the same Informix dev container. Headline result: pure Python is competitive — and faster on 2/5 benchmarks where wire round-trip dominates over codec/marshaling. \| Benchmark \| IfxPy \| informix-db \| Result \| \|---\|---:\|---:\|---:\| \| select_one_row (single-row latency) \| 128 us \| 116 us \| us 9% faster \| \| select_systables_first_10 \| 126 us \| 184 us \| IfxPy 32% faster \| \| select_bench_table_all (1k rows) \| 969 us \| 855 us \| us 12% faster \| \| executemany(1000) in txn \| 21.5 ms \| 30.8 ms \| IfxPy 30% slower \| \| cold_connect_disconnect \| 11.0 ms \| 10.9 ms \| comparable \| Why the surprising wins: IfxPy's path is Python -> OneDB ODBC -> libifdmr -> wire. Ours is Python -> wire. When wire round-trip dominates (single-row, bulk fetch), the missing abstraction layer makes us faster. When per-row marshaling dominates (executemany), IfxPy's C-level execute(stmt, tuple) beats Python BIND-PDU build. Files added under tests/benchmarks/compare/: * Dockerfile.ifxpy — Ubuntu 20.04 base with IfxPy + OneDB drivers * ifxpy_bench.py — IfxPy benchmark workloads matching test__perf.py README.md — methodology, results, install gauntlet, reproduction The IfxPy install gauntlet itself is part of the comparison story: modern Python 3.11 (not 3.13), setuptools <58, permissive CFLAGS, manual download of 92MB OneDB ODBC tarball, four LD_LIBRARY_PATH directories, libcrypt.so.1 (deprecated 2018, missing on Arch / Fedora 35+ / RHEL 9). Versus our `pip install informix-db`. README.md (project root): added "Compared to IfxPy" section under Performance with the headline numbers and a pointer to the full methodology. .gitignore: keep Dockerfile/script/README under tests/benchmarks/ compare/, exclude the 92MB OneDB tarball and the local venv.	2026-05-05 11:41:47 -06:00
Ryan Malloy	0b13acb13d	Phase 30: Final hardening pass (2026.05.05.4) Closes the last 3 medium-severity items from Hamilton's system-wide audit. 0 critical, 0 high, 0 medium remaining. What changed: pool.py: * Pool acquire() growth path: restructured to remove _lock._is_owned() (CPython-private API) usage. Two explicit re-acquires (success path + exception path) replace the older try/finally + private check. connections.py: * _raise_from_rejection now extracts the server's human-readable error string from the rejection payload and surfaces it in the OperationalError. Wrong-password vs wrong-database now produce distinguishable errors. New helper _extract_server_error_text finds the longest printable-ASCII run (8-256 chars). Falls back to a hex preview when no string is found. * _send_exit: broadened catch from (OperationalError, InterfaceError, OSError, ProtocolError) to bare Exception. Best-effort by definition; the socket FD is freed by close()'s finally clause via _socket.IfxSocket.close (idempotent, never-raising). Prevents unexpected errors from escaping close() and leaving partial state. 5 new unit tests in test_protocol.py for _extract_server_error_text: finds-longest-run, picks-longest-of-multiple, too-short-returns-None, empty-handled, caps-at-256. 77 unit + 231 integration + 28 benchmark = 336 tests; ruff clean. Hamilton audit punch list final state: every actionable finding addressed. No CRITICAL, no HIGH, no MEDIUM remaining. Pre-Phase-26: 2 critical, 3 high, 5 medium Post-Phase-30: 0 critical, 0 high, 0 medium - PRODUCTION READY	2026-05-05 10:52:39 -06:00
Ryan Malloy	8e8b81fe8d	Phase 29: Deferred-cleanup queue (2026.05.05.3) Closes the unbounded-leak gap on long-lived pooled connections that Phase 28's cursor finalizer left as future work. When the finalizer can't acquire the wire lock (cross-thread GC during another thread's op), instead of leaking + logging, it enqueues the cleanup PDUs to a per-connection deferred queue. The next normal operation drains the queue under the wire lock, completing the cleanup atomically before the new op. What changed: connections.py: * Connection._pending_cleanup: list[bytes] + Connection._cleanup_lock (separate from _wire_lock - tiny critical section for list mutation only, allows enqueue without waiting for an in-flight wire op) * _enqueue_cleanup(pdus): thread-safe append, callable from any thread (including finalizers without lock ownership) * _drain_pending_cleanup(): pop-the-list + send-each-PDU. Caller must hold _wire_lock. Force-closes on wire desync (same doctrine as _raise_sq_err) * _send_pdu opportunistically drains the queue before sending. Cost is one length-check when queue is empty (the common case) cursors.py: * _finalize_cursor enqueues [_CLOSE_PDU, _RELEASE_PDU] instead of leaking when the lock is busy. WARNING demoted to DEBUG since leak no longer accumulates. Lock-order discipline: _cleanup_lock is held only for list extend/pop; _wire_lock is held for the actual wire I/O. Never grab _cleanup_lock while holding _wire_lock - the drain pops-and-clears under _cleanup_lock, then iterates under _wire_lock (which caller holds). Two new regression tests: * test_enqueue_cleanup_drains_on_next_send_pdu - verifies queue mechanism end-to-end * test_pending_cleanup_thread_safe_enqueue - 8x50 concurrent enqueues, no race-loss 72 unit + 231 integration + 28 benchmark = 331 tests; ruff clean. Hamilton audit punch list status: 0 critical, 0 high, 3 medium remaining (login errors, _send_exit cleanup, pool acquire re-entrance) - all Phase 30 scope.	2026-05-05 10:47:49 -06:00
Ryan Malloy	fdb9ba32d5	Phase 28: Resource leak hardening (2026.05.05.2) Closes Hamilton audit High #4 (bare-except in error drain) and High #5 (no cursor finalizers), plus 1 medium one-liner. After Phases 26-28, 0 CRITICAL and 0 HIGH audit findings remain. Driver is PRODUCTION READY. What changed: cursors.py: * Cursor finalizers via weakref.finalize. Mid-fetch raises (or any GC without explicit close()) now release server-side resources (CLOSE + RELEASE PDUs). Pre-built static PDU bytes at module load so finalizer can run on any thread without allocating or calling cursor methods. * Non-blocking lock acquire prevents cross-thread GC deadlock. WARNING log on lock-busy so leak accumulation is visible. * state=[False] list pattern keeps finalizer closure weak. GIL dependency of atomic single-element mutation documented. * _raise_sq_err near-token parse: (ProtocolError, OSError) only. * _raise_sq_err drain: force-close connection on same exceptions (wire unrecoverable after desync). connections.py: * _raise_sq_err drain: same hardening as cursor version. Force-close on (ProtocolError, OSError, OperationalError) - the latter from _drain_to_eot raising on unknown tags. Documented inline. * Added contextlib import for force-close suppression. cursors.py write_blob_column: * BLOB_PLACEHOLDER validation now requires EXACTLY ONE occurrence. Pre-Phase-28, str.replace silently substituted every occurrence - corrupting SQL containing the literal string in comments etc. Now raises ProgrammingError with workaround pointer. _resultset.py: * Investigated end-of-loop bounds check for parse_tuple_payload. Reverted: long-standing off-by-one in UDTVAR(lvarchar) trailing- pad logic produces benign over-reads (payload is a fully-extracted bytes object; over-reads return empty slices through unused branches). Real silent-corruption surfaces are length-prefix decoders, needing branch-local checks. Documented as deliberate non-fix. Margaret Hamilton review surfaced two blocking conditions: * Asymmetric failure handling: _raise_sq_err force-closed the connection on wire desync, but the cursor finalizer silently swallowed identical failures. "Same wire, same failure mode, same response" - finalizer now matches _raise_sq_err's discipline. * Leak visibility: wire-lock-busy log was DEBUG. Promoted to WARNING so leak accumulation on pooled connections is visible. Plus three documentation improvements (GIL dependency, OperationalError in desync taxonomy, parse_tuple non-fix rationale). One new regression test: * test_write_blob_column_rejects_multiple_placeholders 72 unit + 229 integration + 28 benchmark = 329 tests; ruff clean. Phase 29 ticket (Hamilton recommended): deferred-cleanup queue drained at next _send_pdu, closes unbounded-leak gap on long-lived pooled connections. Not blocking Phase 28. Hamilton audit verdict: Pre-26: 2 critical, 3 high, 5 medium Post-28: 0 critical, 0 high, 4 medium	2026-05-05 03:56:24 -06:00
Ryan Malloy	6afdbcabb3	Phase 27: Wire lock + async cancellation eviction (2026.05.05.1) Closes Hamilton audit Critical #2 (concurrency / wire lock) and High #3 (async cancellation evicts cleanly). Phase 26 fixed what gets returned to the pool; Phase 27 fixes what can interleave on the wire while it's running. What changed: connections.py: * Added Connection._wire_lock = threading.RLock(). Wrapped commit(), rollback(), fast_path_call() under the lock. * _ensure_transaction documents the lock as a precondition AND asserts ownership at runtime (_wire_lock._is_owned()) so a future caller adding a third call site fails loudly. * close() tries to acquire wire lock with 0.5s timeout before SQ_EXIT; skips polite exit and force-closes if busy. cursors.py: * execute() body extracted into _execute_under_wire_lock() and called under the lock. * executemany() body wrapped inline. * _sfetch_at() wrapped - covers all scrollable fetch_* methods that delegate to it. * close() locks the CLOSE+RELEASE for scrollable cursors. pool.py: * release() acquires conn._wire_lock with 5s timeout before rollback. On timeout: log WARNING, evict connection. Constant _RELEASE_WIRE_LOCK_TIMEOUT for tunability. aio.py: * AsyncConnectionPool.connection() now catches CancelledError / TimeoutError separately and routes to broken=True. Combined with the wire lock, asyncio.wait_for around aio DB calls is now safe. * Updated docstring; mirrored in docs/USAGE.md. Margaret Hamilton review surfaced three actionable conditions, all addressed before tagging: * Cancellation test used contextlib.suppress - could pass without exercising the cancellation path on a fast runner. Switched to pytest.raises so the test fails if timeout doesn't fire. * _ensure_transaction precondition documented but unchecked at runtime. Added assert self._wire_lock._is_owned() guard. * Connection.close() was unsynchronized. Now tries 0.5s acquire before SQ_EXIT. Two new regression tests in tests/test_pool.py: * test_concurrent_threads_on_one_connection_dont_interleave_pdus (without lock: garbled results / hangs) * test_async_wait_for_cancellation_evicts_connection (asserts pool size shrinks; cancellation actually fires) 72 unit + 228 integration + 28 benchmark = 328 tests; ruff clean. Hamilton verdict: PRODUCTION READY WITH CAVEATS (was) -> CAVEATS NARROWED FURTHER (now). 0 critical, 2 high remaining (cursor finalizers + bare-except in error drain) - both Phase 28 scope.	2026-05-05 03:40:39 -06:00
Ryan Malloy	5c4a7a57f1	Phase 26: Pool rollback-on-release - CRITICAL data-correctness fix (2026.05.05) Fixes the dirty-pool-checkout bug surfaced by Margaret Hamilton's system-wide audit (Critical #1). The bug: ConnectionPool.release() returned connections with open server-side transactions still active. Request A's uncommitted INSERTs would be inherited by Request B reusing the same connection - B's commit would land A's writes permanently; B's rollback would silently lose them. Same shape as psycopg2's pre-2.5 dirty-pool bug. The fix: pool.release() now rolls back any open transaction before returning the connection to the idle list. The rollback runs OUTSIDE the pool lock since it's a wire round-trip - the connection is already off the idle list and counted in _total, so no other thread can grab it during the rollback window. If the rollback itself fails (dead socket, etc.), the connection is evicted rather than recycled. Async path covered automatically: AsyncConnectionPool.release() delegates to the sync pool's release via _to_thread. Margaret Hamilton review pass surfaced two findings, both addressed: * Silent rollback failure: added a WARNING log via logging.getLogger ("informix_db.pool") so evictions are debuggable. First logger in the project. * Async cancellation race: the fix doesn't introduce the asyncio.wait_for race (Critical #2, deferred to Phase 27), but it adds a code path that can trigger it. Documented loudly in pool.release() docstring, aio.py module docstring, and USAGE.md async section. Recommendation: use read_timeout on the connection instead of asyncio.wait_for until Phase 27 lands. Two new regression tests in tests/test_pool.py: * test_uncommitted_writes_invisible_to_next_acquirer (the bug) * test_committed_writes_survive_pool_checkout (no over-correction) Verified the regression test catches the bug: stashed the fix, ran the test - it fails with "B sees 1 rows - leaked across pool checkout boundary" - confirming it tests the real failure mode. Total tests: 72 unit + 226 integration + 28 benchmark = 326. Deferred to Phase 27 per Hamilton audit: * Critical #2 (concurrency / per-connection wire lock) * High #3 (async cancellation routes to broken=True) * High #4 (bare except in _raise_sq_err drain) * High #5 (no cursor finalizers - server-side resource leaks)	2026-05-05 03:22:18 -06:00
Ryan Malloy	e9aed6ce59	Phase 25: Branch reorder + invariant tripwires (2026.05.04.10) Third-pass optimization on parse_tuple_payload's hot loop. Previous phases removed redundant work; this one removes correct-but-wasteful work: the if/elif chain checked branches in implementation order, not frequency order. Fixed-width types (INT, FLOAT, DATE, BIGINT - the most common columns in real queries) sat at the bottom, paying ~7 frozenset misses per column. Changes (src/informix_db/_resultset.py): * Added _FIXED_WIDTH_TYPES = frozenset(FIXED_WIDTHS.keys()) at module load. * New fast-path branch at the TOP of parse_tuple_payload's loop body that handles every _FIXED_WIDTH_TYPES column inline: one frozenset check, one dict lookup, one decode, continue. Skips every other branch. * Cleaned up the bottom fall-through; it now genuinely only catches unknown types. Performance vs Phase 24 baseline: * parse_tuple_5cols_iso8859: 1659 ns -> 1400 ns (-16%) * parse_tuple_5cols_utf8: 1649 ns -> 1341 ns (-19%) Cumulative vs Phase 21 baseline (before any optimization): * parse_tuple_5cols: 2796 ns -> 1400 ns (-50%) - HALF the time * decode_int: 230 ns -> 139 ns (-40%) Margaret Hamilton review surfaced one HIGH finding addressed before tagging: * H: The fast-path optimization assumes every FIXED_WIDTHS key is decodable WITHOUT qualifier inspection (encoded_length etc.). True today, but a future contributor adding a fixed-width type that needs qualifier bits (like DATETIME does) would silently get wrong decode behavior - Lauren-Bug class failure. Fix: added INVARIANT comment to FIXED_WIDTHS in converters.py AND added tests/test_resultset_invariants.py with three CI tripwire tests: - _FIXED_WIDTH_TYPES is disjoint from every other dispatch branch - Every FIXED_WIDTHS key has a DECODERS entry - DECODERS keys stay < 0x100 (Phase 24 collision-free guarantee) The tests carry instructions: if one fires, don't update the test to match - either restore the property or refactor the optimization. Comments rot when nobody reads them; tests fail loudly. baseline.json refreshed; 72 unit + 224 integration + 28 bench = 324 tests; ruff clean.	2026-05-04 23:34:05 -06:00
Ryan Malloy	dfa60ea501	Phase 24: Decoder dispatch split + struct precompilation (2026.05.04.9) Second pass of hot-path optimization on parse_tuple_payload. Two changes to converters.py: 1. Split decode() into public + internal. Added _decode_base(base_tc, raw, encoding) that takes an already-base-typed code and skips the redundant base_type() call. Public decode() is now a one-line wrapper. parse_tuple_payload's 4 call sites swapped to use _decode_base directly. _fastpath.py's external decode() caller is unaffected. 2. Pre-compiled struct.Struct unpackers. The fixed-width integer/float decoders (_decode_smallint, _decode_int, _decode_bigint, _decode_smfloat, _decode_float, _decode_date) switched from per-call struct.unpack(fmt, raw) to module-level bound methods like _UNPACK_INT = struct.Struct("!i").unpack. Format-string parsed once at module load. Measured 37% faster than per-call struct.unpack on CPython 3.13 micro. Performance vs Phase 23 baseline: * decode_int: 173 ns -> 139 ns (-20%) * decode_bigint: 188 ns -> 150 ns (-20%) * parse_tuple_5cols: 2047 ns -> 1592 ns (-22%) * 1k-row SELECT: 1255 us -> 989 us (-21%) Cumulative vs original Phase 21 baseline: * decode_int: 230 ns -> 139 ns (-40%) * parse_tuple_5cols: 2796 ns -> 1592 ns (-43%) * 1k-row SELECT: 1477 us -> 989 us (-33%) Real-world fetch ceiling: 358K rows/sec -> ~620K rows/sec. Margaret Hamilton review surfaced one HIGH-severity finding addressed before tagging: * H: The no-collision guarantee that makes _decode_base safe is structural but undocumented (all DECODERS keys are ≤ 0xFF, all flag bits are ≥ 0x100, so flagged inputs cannot coincidentally match). Added load-bearing INVARIANT comment at DECODERS dict explaining the constraint and what to do if violated. Cross-referenced from _decode_base's docstring for bidirectional traceability. baseline.json refreshed; all 224 integration tests pass; ruff clean.	2026-05-04 19:31:21 -06:00
Ryan Malloy	0e0dfcba26	Phase 22: User-facing documentation refresh (2026.05.04.7) The docs/USAGE.md predated Phases 17-21, so anyone landing on PyPI was missing scrollable cursors, locale/Unicode, the autocommit cliff finding, and the type-mapping reference. Added sections to docs/USAGE.md: * Locale and Unicode - client_locale, Connection.encoding, CLIENT_LOCALE vs DB_LOCALE, when characters can't fit the codec * Type mapping reference - full SQL <-> Python type table, NULL sentinels subsection, IntervalYM * Performance tips - 53x autocommit-cliff fix, 100x executemany win, 72x pool win, with the actual benchmark numbers from Phase 21.1 * Scrollable cursors - fetch_* API, in-memory vs server-side trade-off, edge cases (past-end semantics, negative indexing, rownumber) * Timeouts and keepalive subsection - production starting points * Environment dictionary subsection - env={} parameter * Known limitations - explicit table of what doesn't work (named params, complex UDT bind, GSSAPI, XA) with workarounds; "things that might surprise you" notes README.md - added Documentation section linking to docs/USAGE.md and tests/benchmarks/README.md. Doc corrections caught during review: * cursor.rownumber is 0-indexed (impl has always been correct; only the original docstring wording was loose) * fetch_* methods work on BOTH scrollable=True and default cursors; the in-memory path supports them too USAGE.md grew from 345 lines to 633.	2026-05-04 17:33:37 -06:00
Ryan Malloy	495128c679	Phase 21.1: executemany perf - it was the autocommit cliff (2026.05.04.6) Investigation of the Phase 21 baseline finding that executemany(N) cost scaled linearly per-row (1.74 ms x N) regardless of batch size. Root cause: every autocommit=True INSERT forces a server-side transaction-log flush. Not a wire-protocol bug. Numbers: * executemany(1000) autocommit=True: 1.72 s (1.72 ms/row) * executemany(1000) in single txn: 32 ms (32 us/row) 53x speedup from changing the transaction boundary, not the driver. Pure protocol overhead is ~32 us/row -> ~31K rows/sec sustained throughput on a single connection. Comparable to pg8000. Added test_executemany_1000_rows_in_txn benchmark to make this visible. Updated README headline numbers and added a "Performance gotchas" section explaining when autocommit=False matters. Decision: don't pipeline. The remaining 32 us is already excellent; the autocommit gotcha is the real user-facing footgun. Docs > code. If someone reports needing >31K rows/sec single-connection, that becomes Phase 22.	2026-05-04 17:26:16 -06:00
Ryan Malloy	90ce035a00	Phase 21: Performance benchmarks (2026.05.04.5) Adds tests/benchmarks/ with pytest-benchmark coverage of the hot codec paths and end-to-end SELECT/INSERT/pool/async round-trips. Establishes a committed baseline.json so PRs can be regression-checked at review via --benchmark-compare. * test_codec_perf.py (16): decode/encode_param/parse_tuple_payload micro-benchmarks - run without container, suitable for pre-merge CI. * test_select_perf.py (4): SELECT round-trips - 1-row latency floor, 10-row, 1k-row full fetch, parameterized. * test_insert_perf.py (3): single-row INSERT, executemany 100 / 1000. * test_pool_perf.py (3): cold connect, pool acquire/release, pool acquire + query + release. * test_async_perf.py (2): async round-trip overhead, 10x concurrent. * baseline.json: committed snapshot, 28 measurements. * benchmark pytest marker, gated off by default. * Makefile: bench / bench-codec / bench-save targets; test-integration excludes benchmarks for speed. Headline numbers (dev container loopback): * decode(int): 181 ns * parse_tuple 5 cols: 2.87 µs/row * SELECT 1 round-trip: 177 µs * Pool acquire+query+release: 295 µs * Cold connect: 11.2 ms (72x slower than pool) UTF-8 decode carries no measurable cost vs iso-8859-1 - confirms Phase 20 didn't regress anything. Total: 69 unit + 211 integration + 28 benchmark = 308 tests.	2026-05-04 17:21:12 -06:00
Ryan Malloy	bea1a1cd0c	Phase 20: UTF-8/multibyte locale support (2026.05.04.4) Thread CLIENT_LOCALE through to user-data string codecs. Driver previously hardcoded iso-8859-1 for all string conversions, which broke any locale outside Western European code points. * Connection.encoding property derived from client_locale via _python_encoding_from_locale (en_US.utf8 -> utf-8, en_US.8859-1 -> iso-8859-1, etc.) * encode_param / decode / parse_tuple_payload accept an encoding parameter; cursor and fast-path call sites forward conn.encoding * Smart-LOB CLOB encode/decode and TEXT decode honor connection encoding * DataError raised for non-representable chars; cursor releases the prepared statement before propagating so connection state stays clean Boundary discipline: protocol-level strings (cursor names, function signatures, SQ_FILE fnames, error near-tokens, SQL text) stay iso-8859-1 (always ASCII, never user-controlled). 9 new integration tests in tests/test_unicode.py covering ASCII round-trip, Latin-1 high-bit, full byte range, locale-mapping, encoding property, UTF-8 negotiation, multibyte (skipped without IFX_UTF8_DATABASE), DataError on non-representable, CLOB round-trip. Total: 69 unit + 212 integration = 281 tests.	2026-05-04 17:13:19 -06:00
Ryan Malloy	9703279bc8	Phase 19: resilience tests via fault injection (v2026.05.04.3) Fills the highest-priority gap from the test-adequacy audit: connection-failure recovery. 12 new integration tests using a thread-based TCP proxy (ControlledProxy) that can be kill()'d at any moment to simulate network drops or server crashes via TCP RST (SO_LINGER=0). Coverage: * Network drop mid-SELECT — OperationalError, not hang * Network drop after describe, before fetch * Network drop during fetch (already-materialized rows still readable; fresh execute fails) * Local socket forced-close (kernel-level disconnect simulation) * I/O error marks connection unusable post-failure * Pool evicts connection that died mid-`with` block (size drops) * Pool revives after all idle connections died (health check on acquire mints fresh) * Async cancellation via asyncio.wait_for — pool stays usable * Cursor reusable after SQL error * Connection survives cursor close after error * Sustained pool load (50 acquire/release cycles, no leak) * read_timeout fires on a hung connection within bounds Catches the failure classes that bite production users: * Hangs (waiting forever on dead socket) * Silent corruption (EOF treated as valid tuple) * Double-fault (cleanup raises after primary error) * Pool poisoning (broken connection returned to pool) * Stale cursor reuse across error boundaries Helper: * tests/_proxy.py — ControlledProxy: thread-based TCP forwarder with kill() for fault injection. Two-thread pump model. SO_LINGER=0 for RST-on-close (mimics router drop). Total: 69 unit + 203 integration = 272 tests. Remaining gaps from the audit (UTF-8 multibyte locale, server-version matrix, performance benchmarks) are real but lower-severity. Phase 19 addressed the one most likely to bite production deployments.	2026-05-04 16:57:06 -06:00
Ryan Malloy	a42dc5c5de	Phase 18: server-side scrollable cursors via SQ_SFETCH (v2026.05.04.2) Opt-in via conn.cursor(scrollable=True). Opens the cursor with SQ_SCROLL (24) before SQ_OPEN (6), keeps it open server-side, and sends SQ_SFETCH (23) per scroll call instead of materializing the result set up-front. User-facing API is identical to Phase 17's in-memory scroll (fetch_first/last/prior/absolute/relative, scroll, rownumber). Only the internal mechanism differs: \| feature \| default \| scrollable=True \|-------------------\|------------------\|------------------ \| memory \| all rows \| one row at a time \| round-trips/fetch \| 0 (after NFETCH) \| 1 per call \| cursor lifetime \| closed after exec\| open until close() \| best for \| sequential iter \| random access on \| huge result sets Wire format (verified against JDBC ScrollProbe capture): * SQ_SFETCH: [short SQ_ID=4][int 23][short scrolltype] [int target][int bufSize=4096][short SQ_EOT] scrolltype: 1=NEXT, 4=LAST, 6=ABSOLUTE * SQ_SCROLL (24): emitted between CURNAME and SQ_OPEN * SQ_TUPID (25): response tag with 1-indexed row position; authoritative source for client-side position tracking Position tracking uses the server's SQ_TUPID rather than client- computed indexes. Total row count discovered lazily via SFETCH(LAST) when negative absolute indexing requires it; cached in _scroll_total_rows. Trap on the way: initial SFETCH used SHORT for bufSize → server hung silently. Same SHORT-vs-INT diagnostic pattern as Phase 4.x's CURNAME+NFETCH. Captured JDBC trace, byte-diffed against ours, found the mismatch (bufSize is INT in modern Informix per isXPSVER8_40 / is2GBFetchBufferSupported). Tests: 14 integration tests in test_scroll_cursor_server.py covering lifecycle, sequential fetch, fetch_first/last/prior/ absolute/relative, negative indexing, scroll, empty result sets, past-end, and random-access on a 100-row result set. Total: 69 unit + 191 integration = 260 tests.	2026-05-04 16:41:25 -06:00
Ryan Malloy	0c856372a6	v2026.05.04: bump CalVer + polish docs Version bump (2026.05.02 → 2026.05.04) reflects the library reaching feature completeness across Phases 1-16. Documentation: * README.md — full rewrite. The previous README was from Phase 1 ("cursor() / execute() / fetchone() arrive in Phase 2"). New README covers: sync + async APIs, connection pool, TLS, full type matrix, smart-LOBs, fast-path RPC, server-compatibility, development workflow, and pointers to the protocol research docs. * docs/USAGE.md — new practical recipe guide. Connecting, cursor lifecycle, parameter binding, transactions (logged + unlogged), executemany, smart-LOB read/write, connection pool, async, TLS, error handling, fast-path RPC, server-side setup steps, and a migration table from IfxPy / legacy informixdb. * CHANGELOG.md — new file. Captures the v2026.05.04 release as the Phase 1-16 completion milestone with a full feature inventory and known-gap list. Future point-releases append here. Classifiers updated: * Development Status: 2 → 4 (Pre-Alpha → Beta) * Added Framework :: AsyncIO Keywords: added asyncio, async. No code changes; tests still pass (69 unit + 163 integration = 232). Ruff clean.	2026-05-04 15:38:09 -06:00
Ryan Malloy	300e1bf7b4	Phase 16: async API (informix_db.aio) Ships AsyncConnection, AsyncCursor, and AsyncConnectionPool that expose async/await versions of the sync API for use with FastAPI, aiohttp, etc. Strategy: thread-pool wrapping (aiopg pattern), not native async. Each blocking I/O call is offloaded to a worker thread via asyncio.to_thread. The event loop never blocks; queries run in parallel up to the pool's max_size. Cost: ~250 lines, no changes to the sync codebase. Native async (Phase 17) would require a ~2000-line transport abstraction refactor — deferred until a real workload needs it. For typical FastAPI/aiohttp workloads (request → one query → return), this is functionally equivalent to native async. Each await yields the loop while a worker thread does the I/O. Only differs for hundreds-of-concurrent-connections workloads. API mirrors the sync API one-to-one: import asyncio from informix_db import aio async def main(): pool = await aio.create_pool(host=..., min_size=1, max_size=10) async with pool.connection() as conn: cur = await conn.cursor() await cur.execute("SELECT id FROM users WHERE name = ?", (name,)) row = await cur.fetchone() await pool.close() The async pool preserves the sync pool's eviction policy: connection errors evict, application errors retain. Tests: 9 integration tests in test_aio.py covering open/close, async-with, simple/parameterized SELECT, async-for cursor iteration, pool acquire/release, 20-query concurrent gather (verifies parallelism through max_size=5 pool), pool async context manager, commit/rollback. Total: 69 unit + 163 integration = 232 tests. Pyproject changes: * Added pytest-asyncio>=1.3.0 as dev dep * asyncio_mode = "auto" so async tests don't need decorators Architectural completion: with Phase 16, every backlog item is done. The Phase 0 ambition — first pure-Python Informix driver, no native deps — is now genuinely complete.	2026-05-04 14:58:19 -06:00
Ryan Malloy	9b1fd8af2c	Phase 1: pure-Python SQLI login works end-to-end This commit takes informix-db from documentation-only (Phase 0 spike) to a functional connect() / close() against a real Informix server. To our knowledge, this is the first pure-socket Informix client in any language — no CSDK, no JVM, no native libraries. Layered architecture per the plan, mirroring PyMySQL's shape: src/informix_db/ __init__.py — PEP 249 surface (connect, exceptions, paramstyle="numeric") exceptions.py — full PEP 249 hierarchy declared up front _socket.py — raw socket I/O (read_exact, write_all, timeouts) _protocol.py — IfxStreamReader / IfxStreamWriter framing primitives (big-endian, 16-bit-aligned variable payloads, length-prefixed nul-terminated strings) _messages.py — SQ_* tags from IfxMessageTypes + ASF/login markers _auth.py — pluggable auth handlers; plain-password is the only Phase-1 implementation connections.py — Connection class: builds the binary login PDU (SLheader + PFheader byte-for-byte per PROTOCOL_NOTES.md §3), sends it, parses the server response, wires up close() Phase 1 design decisions locked in DECISION_LOG.md: - paramstyle = "numeric" (matches Informix ESQL/C convention) - Python >= 3.10 - autocommit defaults to off (PEP 249 implicit) - License: MIT - Distribution name: informix-db (verified PyPI-available) Test coverage: 34 unit tests (codec round-trips against synthetic byte streams; observed login-PDU values from the spike captures asserted as exact byte literals) + 6 integration tests (connect, idempotent close, context manager, bad-password → OperationalError, bad-host → OperationalError, cursor() raises NotImplementedError). pytest — runs 34 unit tests, no Docker needed pytest -m integration — runs 6 integration tests against the Developer Edition container (pinned by digest in tests/docker-compose.yml) pytest -m "" — runs everything ruff is clean across src/ and tests/. One bug found during smoke testing: threading.get_ident() can exceed signed 32-bit on some processes, overflowing struct.pack("!i"). Fixed the same way the JDBC reference does — clamp to signed 32-bit, fall back to 0 if out of range. The field is diagnostic only. One protocol-level observation that AMENDED the JDBC source reading: the "capability section" in the login PDU is three independently negotiated 4-byte ints (Cap_1=1, Cap_2=0x3c000000, Cap_3=0), not one int + 8 reserved zero bytes as my CFR decompile read suggested. The server echoes them back identically. Trust the wire over the decompiler. Phase 1 verification matrix (from PROTOCOL_NOTES.md §12): - Login byte layout: confirmed (server accepts our pure-Python PDU) - Disconnection: confirmed (SQ_EXIT round-trip works) - Framing primitives: confirmed (34 unit tests) - Error path: bad password → OperationalError, bad host → OperationalError Phase 2 (Cursor / SELECT / basic types) is the next phase. The hard unknowns there — exact column-descriptor layout, statement-time error format — were called out as bounded gaps in Phase 0 and have existing captures (02-select-1.socat.log, 02-dml-cycle.socat.log) to characterize against.	2026-05-02 19:10:24 -06:00

20 Commits