Closes Hamilton audit Critical #2 (concurrency / wire lock) and High #3 (async cancellation evicts cleanly). Phase 26 fixed what gets returned to the pool; Phase 27 fixes what can interleave on the wire while it's running. What changed: connections.py: * Added Connection._wire_lock = threading.RLock(). Wrapped commit(), rollback(), fast_path_call() under the lock. * _ensure_transaction documents the lock as a precondition AND asserts ownership at runtime (_wire_lock._is_owned()) so a future caller adding a third call site fails loudly. * close() tries to acquire wire lock with 0.5s timeout before SQ_EXIT; skips polite exit and force-closes if busy. cursors.py: * execute() body extracted into _execute_under_wire_lock() and called under the lock. * executemany() body wrapped inline. * _sfetch_at() wrapped - covers all scrollable fetch_* methods that delegate to it. * close() locks the CLOSE+RELEASE for scrollable cursors. pool.py: * release() acquires conn._wire_lock with 5s timeout before rollback. On timeout: log WARNING, evict connection. Constant _RELEASE_WIRE_LOCK_TIMEOUT for tunability. aio.py: * AsyncConnectionPool.connection() now catches CancelledError / TimeoutError separately and routes to broken=True. Combined with the wire lock, asyncio.wait_for around aio DB calls is now safe. * Updated docstring; mirrored in docs/USAGE.md. Margaret Hamilton review surfaced three actionable conditions, all addressed before tagging: * Cancellation test used contextlib.suppress - could pass without exercising the cancellation path on a fast runner. Switched to pytest.raises so the test fails if timeout doesn't fire. * _ensure_transaction precondition documented but unchecked at runtime. Added assert self._wire_lock._is_owned() guard. * Connection.close() was unsynchronized. Now tries 0.5s acquire before SQ_EXIT. Two new regression tests in tests/test_pool.py: * test_concurrent_threads_on_one_connection_dont_interleave_pdus (without lock: garbled results / hangs) * test_async_wait_for_cancellation_evicts_connection (asserts pool size shrinks; cancellation actually fires) 72 unit + 228 integration + 28 benchmark = 328 tests; ruff clean. Hamilton verdict: PRODUCTION READY WITH CAVEATS (was) -> CAVEATS NARROWED FURTHER (now). 0 critical, 2 high remaining (cursor finalizers + bare-except in error drain) - both Phase 28 scope.
507 lines
37 KiB
Markdown
507 lines
37 KiB
Markdown
# Changelog
|
||
|
||
All notable changes to `informix-db`. Versioning is [CalVer](https://calver.org/) — `YYYY.MM.DD` for date-based releases, `YYYY.MM.DD.N` for same-day post-releases per PEP 440.
|
||
|
||
## 2026.05.05.1 — Wire lock + async cancellation eviction (Phase 27)
|
||
|
||
Closes Hamilton audit findings **Critical #2** (concurrency / wire lock) and **High #3** (async cancellation evicts cleanly). Phase 26 fixed *what gets returned* to the pool; Phase 27 fixes *what can interleave* on the wire while it's running.
|
||
|
||
### What changed
|
||
|
||
**1. Per-connection wire lock** (`src/informix_db/connections.py`):
|
||
- Added `Connection._wire_lock = threading.RLock()`. Wrapped `commit()`, `rollback()`, and `fast_path_call()` in `with self._wire_lock:`.
|
||
- `_ensure_transaction()` documents the lock as a precondition and **asserts ownership** (`self._wire_lock._is_owned()`) — a future caller adding a third call site fails loudly in tests rather than corrupting wire state in production.
|
||
- `Connection.close()` now tries to acquire the wire lock with a 0.5s timeout before sending `SQ_EXIT`. If another thread is mid-operation, skip the polite exit and force-close the socket; the in-flight thread observes EOF on its next read.
|
||
- RLock (not Lock) because `pool.release()` holds the lock with timeout, then calls `conn.rollback()` which itself acquires.
|
||
|
||
**2. Cursor wire methods locked** (`src/informix_db/cursors.py`):
|
||
- `Cursor.execute()` body extracted into `_execute_under_wire_lock()` and called under the lock.
|
||
- `Cursor.executemany()` body wrapped inline.
|
||
- `Cursor._sfetch_at()` (the SQ_SFETCH primitive used by every scrollable fetch_* method) wrapped — every scrollable cursor op gets the lock for free.
|
||
- `Cursor.close()` acquires the lock for the CLOSE+RELEASE on scrollable cursors.
|
||
- `read_blob_column` and `write_blob_column` inherit through their internal `self.execute()` calls.
|
||
|
||
**3. Pool release with timeout-acquire** (`src/informix_db/pool.py`):
|
||
- `release()` now acquires `conn._wire_lock` with a `_RELEASE_WIRE_LOCK_TIMEOUT = 5.0` budget before rolling back. If a still-running worker thread holds the lock past 5s, the connection is evicted instead of recycled. Logged at WARNING level via the Phase 26 logger.
|
||
|
||
**4. Async cancellation → eviction** (`src/informix_db/aio.py`):
|
||
- `AsyncConnectionPool.connection()` now catches `(asyncio.CancelledError, asyncio.TimeoutError)` separately and routes them to `broken=True`. Combined with the wire lock, this means `asyncio.wait_for` around `aio` DB calls is now safe — the connection is either successfully released (worker finished in time) or evicted (worker exceeded the timeout); never returned to the pool in a poisoned state.
|
||
- Removed the Phase 26 cancellation warning from the docstring; now describes the new safety guarantee explicitly.
|
||
- Mirrored in `docs/USAGE.md` async section.
|
||
|
||
### Margaret Hamilton review pass
|
||
|
||
Two passes (Phase 26 review + system-wide audit) had already shaped this design. The Phase 27 review surfaced three actionable conditions:
|
||
|
||
- **Test reliability**: the cancellation regression test used `contextlib.suppress(asyncio.TimeoutError)` — silently passing if the timeout never fired (e.g., on a fast CI runner where the query completes within 1ms). **Fixed**: switched to `pytest.raises(asyncio.TimeoutError)` so the test fails if the cancellation path isn't actually exercised.
|
||
- **Defensive guard for `_ensure_transaction`**: documented "caller must hold the wire lock" as a precondition, but no runtime check. **Fixed**: added `assert self._wire_lock._is_owned()` so a future caller forgetting to lock fails loudly in tests.
|
||
- **Symmetry in `Connection.close()`**: the polite SQ_EXIT was unsynchronized — could interleave with another thread's PDU. **Fixed**: try-acquire with 0.5s timeout; if busy, skip SQ_EXIT and force-close.
|
||
|
||
Plus one cross-phase note: Phase 27 makes Hamilton's High #5 (cursor finalizers) more visible, because cross-thread `__del__` invocation could deadlock on the wire lock. Tracked for Phase 28; Phase 27 doesn't introduce the underlying hazard.
|
||
|
||
### Tests
|
||
|
||
Two new regression tests in `tests/test_pool.py`:
|
||
|
||
- **`test_concurrent_threads_on_one_connection_dont_interleave_pdus`** — two threads each running 20 distinct queries on a shared `Connection`. Without the wire lock, PDU interleaving causes wrong results, ProtocolError, or hangs. With the lock, both threads complete with correct results.
|
||
- **`test_async_wait_for_cancellation_evicts_connection`** — spawns a slow query under `asyncio.wait_for(timeout=0.001)`, asserts (via `pytest.raises`) that the timeout actually fires, then verifies pool size shrinks (connection evicted, not returned to idle).
|
||
|
||
Total: 72 unit + 228 integration + 28 benchmark = **328 tests**.
|
||
|
||
### Hamilton verdict trajectory
|
||
|
||
| Audit pass | Verdict |
|
||
|---|---|
|
||
| Phase 21 era | (no audit yet) |
|
||
| System-wide audit (pre-Phase 26) | PRODUCTION READY WITH CAVEATS — 2 critical, 3 high |
|
||
| Post-Phase 26 | CAVEATS NARROWED — 1 critical, 3 high |
|
||
| **Post-Phase 27** | **CAVEATS NARROWED FURTHER — 0 critical, 2 high** |
|
||
|
||
Remaining audit items (deferred to Phase 28):
|
||
- High #4: bare `except: pass` in `_raise_sq_err` drain
|
||
- High #5: no cursor finalizers (server-side resource leak on mid-fetch raise)
|
||
- Plus 5 medium one-liners
|
||
|
||
## 2026.05.05 — Pool rollback-on-release (Phase 26): CRITICAL data-correctness fix
|
||
|
||
Fixes the dirty-pool-checkout bug surfaced by Margaret Hamilton's system-wide audit. **This is the most important fix in the project's history so far** — it eliminates a class of silent data-correctness failures that affect any application using both the connection pool and non-autocommit transactions.
|
||
|
||
### The bug
|
||
|
||
Before this fix, when a request released a connection to the pool with an open server-side transaction (no explicit `commit()` or `rollback()`), the connection went back to the idle list with the transaction still active. The next request to acquire that connection would inherit it. Specifically:
|
||
|
||
- **Request A** does `INSERT` (no commit) → returns 200 to user → context manager calls `pool.release()` → connection rejoins `_idle` *with A's open transaction*.
|
||
- **Request B** acquires the same connection → its first DML runs *inside A's transaction*. If B commits, A's writes land **permanently**. If B errors before commit, A's writes are silently rolled back.
|
||
|
||
This is the same shape as the bug psycopg2 fixed years ago. Hamilton ranked it Critical #1 in the audit.
|
||
|
||
### The fix
|
||
|
||
`ConnectionPool.release()` now rolls back any uncommitted transaction *before* the connection rejoins the idle list. If the rollback itself fails (dead socket, etc.), the connection is evicted instead of recycled — half-state never returns to the pool.
|
||
|
||
The rollback runs *outside* the pool lock (it's a wire round-trip; we don't want to block other pool operations). The connection is safe to operate on alone because it's not yet in `_idle` and `_total` already counts it as owned.
|
||
|
||
### Async path covered automatically
|
||
|
||
`AsyncConnectionPool.release()` delegates to the sync pool's release via `_to_thread`, so async users get the fix transparently.
|
||
|
||
### Margaret Hamilton review pass
|
||
|
||
The fix went through a focused review. Two findings addressed before tagging:
|
||
|
||
- **Silent failure on rollback exception**: Hamilton flagged that `except Exception:` swallowed rollback failures invisibly. Added a `WARNING`-level log via `logging.getLogger("informix_db.pool")` so evictions are debuggable. (No existing logging convention in the codebase before now; this is the first.)
|
||
- **Async cancellation interaction**: the fix doesn't *introduce* the `asyncio.wait_for` race (that's Critical #2 from the audit, deferred to Phase 27), but it adds a code path that can trigger it. Documented loudly in `pool.release()`'s docstring, in `aio.py`'s module docstring, and in `USAGE.md`'s async section. **The recommendation: use `read_timeout` on the connection instead of `asyncio.wait_for` until Phase 27 lands the per-connection wire lock.**
|
||
|
||
### Tests
|
||
|
||
Two new integration tests in `tests/test_pool.py`:
|
||
|
||
- **`test_uncommitted_writes_invisible_to_next_acquirer`** — the regression test. Pool with `max_size=1` forces A and B to share the connection. A inserts without committing, A's context exits, B acquires (gets the same connection), B verifies no rows visible. **Verified** by stashing the fix that the test fails with "B sees 1 rows — leaked across pool checkout boundary" — confirming the test catches the actual bug, not a false positive.
|
||
- **`test_committed_writes_survive_pool_checkout`** — counterpart that proves the fix doesn't over-correct (committed writes still persist).
|
||
|
||
Total tests: 72 unit + 226 integration + 28 benchmark = **326**.
|
||
|
||
### What this fix does NOT cover (deferred to Phase 27)
|
||
|
||
- **Concurrency / per-connection wire lock** (Hamilton Critical #2). `Connection._lock` is currently held only by `close()`; every wire-touching method runs unsynchronized. Two threads on the same connection (or async cancellation leaving a worker thread still running) can interleave PDU bytes. The Phase 26 fix doesn't make this worse than baseline, but it adds a second path that can trigger it.
|
||
- **No finalizers on cursors** (Hamilton High #5). Mid-fetch raises still leak server-side prepared statements / scrollable cursors.
|
||
- **Bare `except: pass` in `_raise_sq_err` drain** (Hamilton High #4). Masks ProtocolError during error decode.
|
||
- **Async cancellation evicts cleanly** (Hamilton High #3). Currently `TimeoutError`/`CancelledError` aren't routed to `broken=True` — connection rejoins pool. Phase 26 mitigates the data corruption from this case (rollback runs on release), but the wire-desync race remains until Phase 27.
|
||
|
||
These are real and known. Documented in the audit punch list.
|
||
|
||
## 2026.05.04.10 — Branch reorder by frequency + invariant tripwires (Phase 25)
|
||
|
||
Third-pass optimization on `parse_tuple_payload`. Previous phases removed redundant work; this one removes *correct-but-wasteful* work: the if/elif chain checked branches in implementation order, not frequency order. Fixed-width types (INT, FLOAT, DATE, BIGINT — by far the most common columns in real queries) sat at the *bottom* of the chain, paying ~7 frozenset/equality misses per column.
|
||
|
||
### What changed
|
||
|
||
- **Added `_FIXED_WIDTH_TYPES = frozenset(FIXED_WIDTHS.keys())`** at module load in `_resultset.py`.
|
||
- **New fast-path branch at the TOP of `parse_tuple_payload`'s loop body** that handles every `_FIXED_WIDTH_TYPES` column inline (slice + `_decode_base` + advance). For an INT column we now hit one frozenset check, one dict lookup, one decode call — and skip every other branch.
|
||
- **Cleaned up the bottom fall-through** since FIXED_WIDTHS-keyed types no longer reach it. The fall-through now genuinely only catches unknown/unhandled types; comment updated.
|
||
|
||
### Margaret Hamilton review pass — invariant tripwires added
|
||
|
||
The third Hamilton review of this hot path produced one HIGH-severity finding addressed before tagging. The pattern was the same as Phases 23 and 24: an optimization is correct *because of* a property of an external table (here: `FIXED_WIDTHS` keys are decodable without qualifier inspection), but the property is implicit. The finding's recommendation, going beyond a comment:
|
||
|
||
- **Added `tests/test_resultset_invariants.py`** — three CI tripwire tests that turn the structural invariants from comments into executable checks:
|
||
1. `_FIXED_WIDTH_TYPES` is disjoint from every other dispatch branch's type set.
|
||
2. Every `FIXED_WIDTHS` key has a decoder in `DECODERS`.
|
||
3. All `DECODERS` keys are < 0x100 (the Phase 24 collision-free guarantee).
|
||
- **Added INVARIANT comment to `FIXED_WIDTHS`** in `converters.py` explaining the qualifier-free constraint and pointing to the tripwire tests.
|
||
|
||
The tests follow a simple discipline: if one fires, **don't update the test to match the new state** — read the docstring and either restore the property or refactor the optimization to no longer depend on it. Comments rot when nobody reads them; tests fail loudly when someone violates them.
|
||
|
||
### Performance summary (Phase 25)
|
||
|
||
| Benchmark | Phase 24 baseline | NOW | Δ |
|
||
|---|---:|---:|---:|
|
||
| `parse_tuple_5cols_iso8859` | 1659 ns | **1400 ns** | **-16%** |
|
||
| `parse_tuple_5cols_utf8` | 1649 ns | **1341 ns** | **-19%** |
|
||
|
||
End-to-end SELECT numbers fluctuate ±10% run-to-run on sub-millisecond loopback round-trips; the codec micro-benchmark is the durable measurement.
|
||
|
||
### Cumulative improvement (vs. original Phase 21 baseline, before any optimization)
|
||
|
||
| Metric | Original | NOW | Total Δ |
|
||
|---|---:|---:|---:|
|
||
| `parse_tuple_5cols` | 2796 ns | **1400 ns** | **-50%** |
|
||
| `decode_int` | 230 ns | 139 ns | -40% |
|
||
| `select_bench_table_all` (1k rows, where measurable) | 1477 µs | ~990 µs | ≈-33% |
|
||
|
||
The per-row decode hot path is **half the time it took at start of optimization work**. Real-world fetch ceiling: 358K rows/sec → ~715K rows/sec on a single connection.
|
||
|
||
### Tests
|
||
|
||
3 new unit tests (the invariant tripwires). Total: **72 unit + 224 integration + 28 benchmark = 324 tests**.
|
||
|
||
### Baseline refreshed
|
||
|
||
`tests/benchmarks/baseline.json` updated. All tests pass; ruff clean.
|
||
|
||
## 2026.05.04.9 — Decoder dispatch + struct precompilation (Phase 24)
|
||
|
||
Second pass of hot-path optimization. Phase 23 lifted IfxType conversions out of the loop body in `_resultset.py` (-26% on `parse_tuple_5cols`). Phase 24 goes deeper into the codec layer.
|
||
|
||
### What changed
|
||
|
||
**1. Split `decode()` into public + internal in `src/informix_db/converters.py`.**
|
||
- New `_decode_base(base_tc, raw, encoding)` takes an *already-base-typed* type code and skips the `base_type()` flag strip. Documented INVARIANT: caller's responsibility to base-type the input.
|
||
- Public `decode()` is now a one-line wrapper: `return _decode_base(base_type(type_code), raw, encoding)`. Same external semantics, same backward-compat — `_fastpath.py:171` is unaffected.
|
||
- `parse_tuple_payload` (4 call sites) now imports and calls `_decode_base` directly. Saves ~100 ns × N columns per row by skipping the redundant flag strip.
|
||
|
||
**2. Pre-compiled `struct.Struct` unpackers.** The fixed-width integer/float decoders (`_decode_smallint`, `_decode_int`, `_decode_bigint`, `_decode_smfloat`, `_decode_float`, `_decode_date`) switched from per-call `struct.unpack(fmt, raw)` to module-level bound methods like `_UNPACK_INT = struct.Struct("!i").unpack`. Format-string parsing happens once at module load instead of per call — measured 37% faster than per-call `struct.unpack` on a CPython 3.13 microbenchmark.
|
||
|
||
### Margaret Hamilton review pass
|
||
|
||
The optimization went through a second failure-mode review. One HIGH-severity finding addressed:
|
||
|
||
- **H (high)**: The no-collision guarantee that makes `_decode_base` safe is *structural but undocumented*. Specifically: all DECODERS keys are ≤ 0xFF; all flag bits in `_types.py` are ≥ 0x100; therefore a flagged input *cannot* coincidentally match a DECODERS key. This guarantee is correct today but fragile — adding a decoder for a type code that uses bits ≥ 0x100 would silently weaken it. **Fixed**: added a load-bearing INVARIANT comment at the `DECODERS` dict declaration explaining the constraint and what to do if it's violated. Cross-referenced from `_decode_base`'s docstring so the contract is bidirectionally traceable.
|
||
|
||
### Performance summary (Phase 24)
|
||
|
||
| Benchmark | Phase 23 baseline | NOW | Δ this phase |
|
||
|---|---:|---:|---:|
|
||
| `decode_int` | 173 ns | **139 ns** | **-20%** |
|
||
| `decode_bigint` | 188 ns | **150 ns** | **-20%** |
|
||
| `decode_smallint` | 169 ns | **137 ns** | **-19%** |
|
||
| `decode_date` | 521 ns | **435 ns** | **-17%** |
|
||
| `parse_tuple_5cols_iso8859` | 2047 ns | **1592 ns** | **-22%** |
|
||
| `select_bench_table_all` (1k rows) | 1255 µs | **989 µs** | **-21%** |
|
||
| `select_with_param` | 977 µs | 860 µs | -12% |
|
||
|
||
### Cumulative improvement (vs. original Phase 21 baseline, before any optimization)
|
||
|
||
| Metric | Original | NOW | Total Δ |
|
||
|---|---:|---:|---:|
|
||
| `decode_int` | 230 ns | **139 ns** | **-40%** |
|
||
| `parse_tuple_5cols` | 2796 ns | **1592 ns** | **-43%** |
|
||
| `select_bench_table_all` (1k rows) | 1477 µs | **989 µs** | **-33%** |
|
||
|
||
Real-world fetch ceiling: 358K rows/sec → ~620K rows/sec on a single connection.
|
||
|
||
### Baseline refreshed
|
||
|
||
`tests/benchmarks/baseline.json` updated. All 224 integration tests pass; ruff clean.
|
||
|
||
## 2026.05.04.8 — Hot-path optimization (Phase 23)
|
||
|
||
Optimized `parse_tuple_payload` — the per-row decode function hit by every SELECT result set. **The 1k-row fetch wall-clock improved 19%** (1477 µs → 1198 µs). Bench micro-target (`parse_tuple_5cols`) improved 27% (2796 ns → 2030 ns). All 224 integration tests still pass; ruff clean.
|
||
|
||
### What changed (`src/informix_db/_resultset.py`)
|
||
|
||
- **Removed redundant `base_type()` call from the hot loop.** `ColumnInfo.type_code` is already base-typed by `parse_describe` at construction — calling `base_type(col.type_code)` again per column per row was pure waste. This was the single largest savings.
|
||
- **Lifted `int(IfxType.X)` to module-level constants** (`_TC_CHAR`, `_TC_VARCHAR`, etc.). Original code did the IntFlag→int conversion inline ~10 times per loop iteration; now done once at module import.
|
||
- **Moved lazy imports to module top** (`_decode_datetime`, `_decode_interval`, `BlobLocator`, `ClobLocator`, `RowValue`, `CollectionValue`). Saves a per-call attribute lookup; verified no circular import risk.
|
||
- **Three precomputed frozensets** (`_LENGTH_PREFIXED_SHORT_TYPES`, `_COMPOSITE_UDT_TYPES`, `_NUMERIC_TYPES`) replace inline tuple-membership checks.
|
||
- **`_COLLECTION_KIND_MAP` wrapped in `MappingProxyType`** — actually frozen against accidental mutation, not just nominally.
|
||
|
||
### Margaret Hamilton review pass
|
||
|
||
The optimization went through a rigorous failure-mode review. Findings addressed before tagging:
|
||
|
||
- **H1 (high)**: `cursor._dereference_blob_columns` (line 304-310) was doing the same redundant `base_type()` call. Stripped for consistency — otherwise the next reader would write a "fix" to one site or the other based on which they noticed.
|
||
- **M1 (medium)**: documented the load-bearing invariant at its single producer site. `parse_describe` now has a comment naming readers that depend on `ColumnInfo.type_code` being base-typed, so a future contributor adding a new construct site has a grep-able warning.
|
||
- **M2 (medium)**: `_COLLECTION_KIND_MAP` is now `MappingProxyType` (was a plain dict).
|
||
- **L1 (low)**: stale "(line 151)" comment reference replaced with a pointer to the named INVARIANT comment.
|
||
|
||
### Performance summary
|
||
|
||
| Benchmark | Pre | Post | Delta |
|
||
|---|---:|---:|---:|
|
||
| `parse_tuple_5cols_iso8859` | 2796 ns | 2030 ns | **-27%** |
|
||
| `parse_tuple_5cols_utf8` | 2791 ns | 2041 ns | **-27%** |
|
||
| `select_bench_table_all` (1k rows) | 1477 µs | 1198 µs | **-19%** |
|
||
| `select_with_param` (~50 rows) | 1069 µs | 994 µs | -7% |
|
||
| Codec micro-benchmarks (`decode_int`, etc.) | unchanged ±noise | | |
|
||
| `cold_connect_disconnect` | unchanged | | |
|
||
| `executemany` series | unchanged | | |
|
||
|
||
Real-world fetch ceiling on a single connection: 350K rows/sec → 490K rows/sec.
|
||
|
||
### Baseline refreshed
|
||
|
||
`tests/benchmarks/baseline.json` updated with the new (faster) numbers. Future regressions will be measured against this floor.
|
||
|
||
## 2026.05.04.7 — User-facing documentation refresh (Phase 22)
|
||
|
||
The `docs/USAGE.md` predated Phases 17-21, so anyone landing on PyPI was missing scrollable cursors, locale/Unicode, the autocommit cliff finding, and the type-mapping reference. This release closes that gap.
|
||
|
||
### Added (in `docs/USAGE.md`)
|
||
|
||
- **Locale and Unicode** — full section on `client_locale`, `Connection.encoding`, the CLIENT_LOCALE vs DB_LOCALE distinction, what happens when characters can't fit the codec, how to create a UTF-8 database. Bridges the gap between Phase 20's plumbing and a user's first multibyte INSERT.
|
||
- **Type mapping reference** — full SQL ↔ Python type table covering integer widths, DECIMAL, all string types, DATE/DATETIME/INTERVAL, BYTE/TEXT, BLOB/CLOB, ROW/COLLECTION, and `NULL`. Plus subsections on NULL sentinels and `IntervalYM`.
|
||
- **Performance tips** — three numbered patterns: wrap bulk INSERTs in a transaction (53× speedup), use `executemany` not a loop (≈100× speedup), use a connection pool (72× speedup over cold connect). Quotes the actual benchmark numbers from Phase 21.1.
|
||
- **Scrollable cursors** — `fetch_first` / `fetch_last` / `fetch_prior` / `fetch_absolute` / `fetch_relative` / `scroll()` API; in-memory vs `cursor(scrollable=True)` server-side trade-offs; edge cases (past-end semantics, negative indexing, `rownumber` indexing).
|
||
- **Timeouts and keepalive** subsection — `connect_timeout` / `read_timeout` / `keepalive` semantics with a "reasonable production starting point" recommendation.
|
||
- **Environment dictionary** subsection — the `env={}` parameter, with examples (OPT_GOAL, OPTOFC, IFX_AUTOFREE).
|
||
- **Known limitations** — explicit table of what doesn't work yet (named parameters, complex UDT bind, GSSAPI, XA, listener failover, etc.) with workarounds where they exist. Plus "things that work but might surprise you" (autocommit default, no-op commit on unlogged DB, SERIAL retrieval).
|
||
|
||
### Changed
|
||
|
||
- **`README.md`** — added a "Documentation" section linking to `docs/USAGE.md` and `tests/benchmarks/README.md`. Bumped phase count.
|
||
|
||
### Doc corrections caught during review
|
||
|
||
- `cursor.rownumber` is **0-indexed**, not 1-indexed (the implementation has been correct; only the original docstring wording was loose).
|
||
- `fetch_*` methods work on **both** scrollable=True and the default (in-memory) cursor — the original Phase 17 docs implied scrollable=True was required, but the in-memory path supports them too.
|
||
|
||
## 2026.05.04.6 — `executemany` perf finding: it was the autocommit cliff
|
||
|
||
Investigation of the Phase 21 finding that `executemany(N)` cost scaled linearly per-row (1.74 ms × N) regardless of batch size. **Root cause: every autocommit-True INSERT forces a server-side transaction-log flush.** Not a wire-protocol bug.
|
||
|
||
### Added
|
||
|
||
- **`test_executemany_1000_rows_in_txn`** benchmark — same workload, but inside a single transaction with one COMMIT at the end. Isolates pure protocol cost from server-storage cost.
|
||
- New module-scoped `txn_conn` fixture in `tests/benchmarks/test_insert_perf.py` for autocommit-False benchmarks.
|
||
|
||
### Findings
|
||
|
||
| Mode | Total | Per row |
|
||
|-|-:|-:|
|
||
| `executemany(1000)` autocommit=True | 1.72 s | 1.72 ms |
|
||
| `executemany(1000)` in single txn | 32 ms | **32 µs** |
|
||
|
||
**53× speedup from changing the transaction boundary, not the driver.** Pure protocol overhead is ~32 µs/row → ~31,000 rows/sec sustained throughput on a single connection. Comparable to mature pure-Python drivers (pg8000).
|
||
|
||
### Changed
|
||
|
||
- **`tests/benchmarks/README.md`** — updated headline numbers to show both modes, added a "Performance gotchas" section explaining when to use `autocommit=False` for bulk loads.
|
||
- **`tests/benchmarks/baseline.json`** — refreshed to include the new txn-mode measurement (now 29 entries, was 28).
|
||
|
||
### Decision: don't pipeline
|
||
|
||
Pipelining BIND+EXECUTE PDUs (writing N without waiting for responses between them) could potentially halve the 32 µs/row figure on loopback. Decided against:
|
||
|
||
- The remaining 32 µs is already excellent — single-connection bulk-load performance is not where users hit limits.
|
||
- Pipelining adds complexity around TCP send-buffer management, partial-failure semantics, and error reporting (which row failed when 50 are in flight).
|
||
- The autocommit gotcha is the *real* user-facing footgun. Better docs > more code.
|
||
|
||
If someone reports needing >31K rows/sec single-connection, this becomes Phase 22 work.
|
||
|
||
## 2026.05.04.5 — Performance benchmarks (Phase 21)
|
||
|
||
Adds `tests/benchmarks/` — a `pytest-benchmark` driven suite covering codec micro-benchmarks (no server required) and end-to-end SELECT/INSERT/pool/async benchmarks. Establishes a committed `baseline.json` so future PRs can be compared against the floor and regressions caught at review.
|
||
|
||
### Added
|
||
|
||
- **`tests/benchmarks/test_codec_perf.py`** — 16 micro-benchmarks for the hot codec paths (`decode`, `encode_param`, `parse_tuple_payload`). Run without an Informix container; suitable for pre-merge CI.
|
||
- **`tests/benchmarks/test_select_perf.py`** — 4 SELECT round-trip benchmarks: 1-row latency floor, ~10 rows, full 1k-row table, parameterized.
|
||
- **`tests/benchmarks/test_insert_perf.py`** — 3 INSERT benchmarks: single-row, `executemany(100)`, `executemany(1000)`.
|
||
- **`tests/benchmarks/test_pool_perf.py`** — 3 pool benchmarks: cold connect (login handshake cost), pool acquire/release, pool acquire + tiny query + release.
|
||
- **`tests/benchmarks/test_async_perf.py`** — 2 async benchmarks: single async round-trip overhead, 10 concurrent SELECTs through an async pool.
|
||
- **`tests/benchmarks/conftest.py`** — `bench_conn` (long-lived autocommit connection) and `bench_table` (pre-populated 1k-row table) fixtures, both session-scoped.
|
||
- **`tests/benchmarks/baseline.json`** — committed baseline (28 measurements) for `--benchmark-compare` regression checks.
|
||
- **`tests/benchmarks/README.md`** — headline numbers, regression policy, how to update baseline, what each benchmark measures.
|
||
- **`make bench` / `make bench-codec` / `make bench-save`** Makefile targets.
|
||
- **`benchmark` pytest marker** — gated, off by default. `pytest -m benchmark` to opt in.
|
||
|
||
### Changed
|
||
|
||
- **`make test-integration`** now uses `-m "integration and not benchmark"` so the integration suite stays fast (~6s) — benchmarks (~27s) are gated behind `make bench`.
|
||
- **`pytest`** default `-m` now excludes both `integration` and `benchmark`. Default run is unit-only.
|
||
|
||
### Headline numbers (dev container, x86_64 Linux, loopback)
|
||
|
||
| Operation | Mean |
|
||
|-|-:|
|
||
| `decode(int)` (per cell) | 181 ns |
|
||
| `parse_tuple_payload(5 cols)` (per row) | 2.87 µs |
|
||
| `SELECT 1` round-trip | 177 µs |
|
||
| Pool acquire + tiny query + release | 295 µs |
|
||
| **Cold connect + close** | **11.2 ms** |
|
||
|
||
**Pool-vs-cold delta is 72×.** UTF-8 decode carries no measurable cost over iso-8859-1 (Phase 20 didn't slow anything down).
|
||
|
||
### Tests
|
||
|
||
28 new benchmark tests. Total: **69 unit + 211 integration + 28 benchmark = 308**.
|
||
|
||
## 2026.05.04.4 — UTF-8 / multibyte locale support
|
||
|
||
Threads the connection's `CLIENT_LOCALE` through to user-data string codecs so multibyte locales (UTF-8, etc.) round-trip correctly. The driver previously hardcoded `iso-8859-1` for every string conversion — fine for Western European text, broken-by-design for CJK, Cyrillic, Arabic, emoji.
|
||
|
||
### Added
|
||
|
||
- **`Connection.encoding`** property — reports the Python codec name derived from `CLIENT_LOCALE` (e.g., `iso-8859-1`, `utf-8`, `iso-8859-15`). Default for a connection without `client_locale=` is `iso-8859-1` (compatible with the legacy default).
|
||
|
||
- **`informix_db.connections._python_encoding_from_locale(locale: str)`** — maps Informix locale strings (`en_US.utf8`, `en_US.8859-1`, `en_US.819`) to Python codec names. Falls back to `iso-8859-1` for unknown / unsuffixed forms.
|
||
|
||
### Changed
|
||
|
||
- **`encode_param(value, encoding=...)`** and `_encode_str(value, encoding=...)` honor the connection's encoding instead of hardcoded `iso-8859-1`. Cursor's `_emit_bind_params` forwards `self._conn.encoding` per parameter.
|
||
|
||
- **`decode(type_code, raw, encoding=...)`** and `parse_tuple_payload(reader, columns, encoding=...)` thread the encoding to string column decoders (CHAR, VARCHAR, NCHAR, NVCHAR, LVARCHAR). Cursor's `_read_fetch_response` forwards `self._conn.encoding`.
|
||
|
||
- **Smart-LOB CLOB encode/decode** (`write_blob_column`, simple-LOB TEXT fetch) honor `self._conn.encoding`.
|
||
|
||
- **Fast-path RPC** (`Connection.fast_path_call`) honors `self._encoding` for its bound parameters.
|
||
|
||
### Boundary discipline
|
||
|
||
Protocol-level strings stay `iso-8859-1` (always ASCII, never user-controlled): cursor names, function signatures, server-fabricated SQ_FILE virtual filenames, error "near tokens", SQL keywords/identifiers. Only user-data strings (column values, parameter binds) follow `CLIENT_LOCALE`.
|
||
|
||
### Error handling
|
||
|
||
Encoding-can't-represent-this-value (e.g., `"你好"` on an `8859-1` connection) now raises `informix_db.DataError` instead of letting Python's `UnicodeEncodeError` leak. The cursor releases the prepared statement before propagating, so the connection survives cleanly for the next query.
|
||
|
||
### Tests
|
||
|
||
9 new integration tests in `tests/test_unicode.py`:
|
||
- ASCII round-trip (regression)
|
||
- Latin-1 high-bit chars round-trip on default locale
|
||
- Full byte range 0x20-0xFE round-trip via VARCHAR
|
||
- Locale → Python codec mapping for common forms
|
||
- `Connection.encoding` exposes the resolved codec
|
||
- UTF-8 locale negotiation (server transcodes for ASCII even with 8859-1 DB)
|
||
- UTF-8 multibyte round-trip (skipped without `IFX_UTF8_DATABASE` env var pointing to a UTF-8 database)
|
||
- Non-representable char raises `DataError` cleanly; connection survives
|
||
- CLOB column round-trips Latin-1 text honoring connection encoding
|
||
|
||
Total: **69 unit + 212 integration = 281 tests**.
|
||
|
||
### Limitations
|
||
|
||
- Multibyte UTF-8 storage requires both `client_locale='en_US.utf8'` AND a database whose `DB_LOCALE` is UTF-8. The dev container's `testdb` is `8859-1`, so storing CJK chars there will continue to fail server-side regardless of the client codec. The `test_utf8_multibyte_round_trip` test is gated on the `IFX_UTF8_DATABASE` env var pointing to a UTF-8 database.
|
||
|
||
## 2026.05.04.3 — Resilience tests (fault injection)
|
||
|
||
### Added
|
||
|
||
- **`tests/_proxy.py`** — `ControlledProxy` helper: a thread-based TCP forwarder between the test client and Informix, with a `kill()` method that sends TCP RST (via `SO_LINGER=0`) to simulate a network drop or server crash. Used as a context manager.
|
||
|
||
- **`tests/test_resilience.py`** — 12 integration tests filling the resilience gap identified in the test-coverage audit:
|
||
- Network drop mid-SELECT raises `OperationalError` cleanly (not hang)
|
||
- Network drop after describe but before fetch
|
||
- Network drop during fetch iteration (already-materialized rows still readable, fresh execute fails)
|
||
- Local socket close (yank-the-rug from client side)
|
||
- I/O error marks connection unusable
|
||
- Pool evicts a connection that died mid-`with` block
|
||
- Pool revives after all idle connections died (health-check on acquire mints fresh)
|
||
- Async cancellation via `asyncio.wait_for` — pool stays usable for subsequent queries
|
||
- Cursor reusable after SQL error
|
||
- Connection survives cursor close after error
|
||
- Pool sustained-load smoke (50 acquire/release cycles, no leak)
|
||
- `read_timeout` fires on a hung connection
|
||
|
||
### What this catches
|
||
|
||
- **Hangs** (waiting forever on a dead socket)
|
||
- **Silent data corruption** (treating EOF as a valid tuple)
|
||
- **Double-fault** (one error → cleanup raises a different error)
|
||
- **Pool poisoning** (returning a broken connection to the pool)
|
||
- **Stale cursor reuse** (same cursor reused across an error boundary)
|
||
|
||
### Tests
|
||
|
||
12 new integration tests. Total: **69 unit + 203 integration = 272 tests**.
|
||
|
||
The Phase 19 work fills the highest-priority gap from the test-adequacy audit. Remaining gaps from that audit (UTF-8 locale, server-version matrix, performance benchmarks) are real but lower-severity.
|
||
|
||
## 2026.05.04.2 — Server-side scrollable cursors
|
||
|
||
### Added
|
||
|
||
- **Server-side scrollable cursors** (Phase 18): opt in via `conn.cursor(scrollable=True)`. The cursor opens with `SQ_SCROLL` (24) before `SQ_OPEN` (6), the result set stays materialized server-side, and each scroll method sends `SQ_SFETCH` (23) to fetch one row at a time. Use this for huge result sets where in-memory materialization would be wasteful.
|
||
|
||
The user-facing API is identical to Phase 17's in-memory scroll (`fetch_first`, `fetch_last`, `fetch_prior`, `fetch_absolute`, `fetch_relative`, `scroll`, `rownumber`); only the internal mechanism differs:
|
||
|
||
| | Default cursor | `scrollable=True` |
|
||
|---|---|---|
|
||
| Memory | All rows materialized | One row at a time |
|
||
| Network round-trips per fetch | 0 (after initial NFETCH) | 1 (one SFETCH per call) |
|
||
| Cursor lifetime | Closed after `execute()` | Open until `close()` |
|
||
| Best for | Moderate result sets, sequential iteration | Huge result sets, random access |
|
||
|
||
Implementation discovers total row count lazily via SFETCH(LAST=4) when negative absolute indexing requires it; result is cached in `_scroll_total_rows`. Position tracking is authoritative from the server's `SQ_TUPID` (25) tag, not client-computed.
|
||
|
||
### Wire-protocol details
|
||
|
||
- `SQ_SFETCH` (23): `[short SQ_ID=4][int 23][short scrolltype][int target][int bufSize=4096][short SQ_EOT]`. scrolltype values: 1=NEXT, 4=LAST, 6=ABSOLUTE.
|
||
- `SQ_SCROLL` (24): emitted between CURNAME and SQ_OPEN to mark the cursor as scrollable.
|
||
- `SQ_TUPID` (25): server response carrying the 1-indexed row position the server just delivered. `[short 25][int rowID]`.
|
||
|
||
The trap on the way: I initially used SHORT for `bufSize` and the server hung silently — same SHORT-vs-INT diagnostic pattern as Phase 4.x's CURNAME+NFETCH. Captured a JDBC trace, byte-diffed against ours, found the mismatch.
|
||
|
||
### Tests
|
||
|
||
14 new integration tests in `test_scroll_cursor_server.py`. Total: **69 unit + 191 integration = 260 tests**.
|
||
|
||
## 2026.05.04.1 — Scroll cursors
|
||
|
||
### Added
|
||
|
||
- **Scroll cursor API** on `Cursor` (Phase 17):
|
||
- `cur.scroll(value, mode='relative'|'absolute')` — PEP 249 compatible
|
||
- `cur.fetch_first()` / `cur.fetch_last()` — jump to ends
|
||
- `cur.fetch_prior()` — backward step (SQL-standard semantics: from past-end yields the last row)
|
||
- `cur.fetch_absolute(n)` — 0-indexed jump; negative `n` indexes from the end
|
||
- `cur.fetch_relative(n)` — n-step from current position
|
||
- `cur.rownumber` — current 0-indexed position (None if before-first or no result set)
|
||
|
||
In-memory implementation — no new wire-protocol; the existing materialized result set in `cur._rows` is now indexed rather than iterated. For server-side scroll over huge result sets, `SQ_SFETCH` (tag 23) would be needed — Phase 18 if anyone hits the in-memory ceiling.
|
||
|
||
### Tests
|
||
|
||
14 new integration tests in `test_scroll_cursor.py`. Total: **69 unit + 177 integration = 246 tests**.
|
||
|
||
## 2026.05.04 — Library completion
|
||
|
||
The Phase 0 ambition — first pure-Python Informix SQLI driver — reaches feature completeness. Adds async, TLS, connection pool, smart-LOBs, fast-path RPC, composite UDTs.
|
||
|
||
### Added
|
||
|
||
- **Async API** (`informix_db.aio`) — `AsyncConnection`, `AsyncCursor`, `AsyncConnectionPool` for FastAPI / aiohttp / asyncio. Each blocking I/O call is offloaded to a worker thread via `asyncio.to_thread`; event loop never blocks.
|
||
- **Connection pool** (`informix_db.create_pool`) — thread-safe with min/max sizing, lazy growth, health-check on acquire, error-aware eviction.
|
||
- **TLS** — `tls=True` for self-signed dev servers, `tls=ssl.SSLContext` for production. Wrapping happens in `IfxSocket` so the rest of the protocol layer is unaware.
|
||
- **Smart-LOBs** (BLOB / CLOB) — full read/write end-to-end via `cursor.read_blob_column()` / `cursor.write_blob_column()` using the server's `lotofile` / `filetoblob` SQL functions intercepted at the `SQ_FILE` (98) protocol level.
|
||
- **Legacy in-row blobs** (BYTE / TEXT) — bind + read via the `SQ_BBIND` / `SQ_BLOB` / `SQ_FETCHBLOB` protocol family.
|
||
- **Fast-path RPC** (`Connection.fast_path_call`) — direct stored-procedure invocation bypassing PREPARE/EXECUTE; routine handles cached per-connection.
|
||
- **Composite UDT recognition** — `ROW`, `SET`, `MULTISET`, `LIST` columns return typed `RowValue` / `CollectionValue` wrappers exposing schema and raw bytes.
|
||
- **Type codecs** — `INTERVAL` (both DAY-TO-FRACTION and YEAR-TO-MONTH families), `DATETIME` (all qualifier ranges), `DECIMAL` / `MONEY` (BCD with sign+exp head byte and asymmetric base-100 complement for negatives), `DATE`, `BOOL`, all integer / float widths, `CHAR` / `VARCHAR` / `LVARCHAR`.
|
||
- **Transactions** — implicit `SQ_BEGIN` before each transaction in non-ANSI logged DBs; transparent no-ops on unlogged DBs.
|
||
- **PEP 249 exception hierarchy** — server `SQLCODE` mapped to the right exception class (`IntegrityError` for duplicate-key violations, `ProgrammingError` for syntax errors, etc.).
|
||
|
||
### Documentation
|
||
|
||
- [`README.md`](README.md) — overview and quick-start
|
||
- [`docs/USAGE.md`](docs/USAGE.md) — practical recipes and migration guide
|
||
- [`docs/PROTOCOL_NOTES.md`](docs/PROTOCOL_NOTES.md) — byte-level wire-format reference
|
||
- [`docs/DECISION_LOG.md`](docs/DECISION_LOG.md) — phase-by-phase architectural decisions, with the *why* preserved
|
||
- [`docs/JDBC_NOTES.md`](docs/JDBC_NOTES.md) — index into the decompiled IBM JDBC reference
|
||
- [`docs/CAPTURES/`](docs/CAPTURES/) — annotated socat hex-dump captures
|
||
|
||
### Test coverage
|
||
|
||
232 tests total: **69 unit + 163 integration**. Unit tests run with no external dependencies; integration tests run against the IBM Informix Developer Edition Docker image.
|
||
|
||
### Known gaps (deferred)
|
||
|
||
- **Full ROW/COLLECTION recursive parsing**: Phase 12 ships type recognition + raw-bytes wrapper. Parsing the textual representation into typed Python tuples/sets/lists is deferred — most workloads can use SQL projections (`SELECT row_col.fieldname FROM tbl`) instead.
|
||
- **UDT parameter encoding for fast-path**: scalar params/returns work; passing a 72-byte BLOB locator as a UDT param requires extending the SQ_BIND encoder with the extended_owner/extended_name preamble for type > 18.
|
||
- **Native async I/O**: Phase 16 ships a thread-pool wrapper that's functionally equivalent for typical FastAPI workloads. Native async (asyncpg-style transport abstraction) would be Phase 17 if a real workload needs it.
|
||
|
||
## 2026.05.02 — Phase 1: connection lifecycle
|
||
|
||
Initial release. `connect()` / `close()` works end-to-end. Cursor / execute / fetch arrived in Phase 2 (subsequent commits within the same session).
|