informix-db

Author	SHA1	Message	Date
Ryan Malloy	8e8b81fe8d	Phase 29: Deferred-cleanup queue (2026.05.05.3) Closes the unbounded-leak gap on long-lived pooled connections that Phase 28's cursor finalizer left as future work. When the finalizer can't acquire the wire lock (cross-thread GC during another thread's op), instead of leaking + logging, it enqueues the cleanup PDUs to a per-connection deferred queue. The next normal operation drains the queue under the wire lock, completing the cleanup atomically before the new op. What changed: connections.py: * Connection._pending_cleanup: list[bytes] + Connection._cleanup_lock (separate from _wire_lock - tiny critical section for list mutation only, allows enqueue without waiting for an in-flight wire op) * _enqueue_cleanup(pdus): thread-safe append, callable from any thread (including finalizers without lock ownership) * _drain_pending_cleanup(): pop-the-list + send-each-PDU. Caller must hold _wire_lock. Force-closes on wire desync (same doctrine as _raise_sq_err) * _send_pdu opportunistically drains the queue before sending. Cost is one length-check when queue is empty (the common case) cursors.py: * _finalize_cursor enqueues [_CLOSE_PDU, _RELEASE_PDU] instead of leaking when the lock is busy. WARNING demoted to DEBUG since leak no longer accumulates. Lock-order discipline: _cleanup_lock is held only for list extend/pop; _wire_lock is held for the actual wire I/O. Never grab _cleanup_lock while holding _wire_lock - the drain pops-and-clears under _cleanup_lock, then iterates under _wire_lock (which caller holds). Two new regression tests: * test_enqueue_cleanup_drains_on_next_send_pdu - verifies queue mechanism end-to-end * test_pending_cleanup_thread_safe_enqueue - 8x50 concurrent enqueues, no race-loss 72 unit + 231 integration + 28 benchmark = 331 tests; ruff clean. Hamilton audit punch list status: 0 critical, 0 high, 3 medium remaining (login errors, _send_exit cleanup, pool acquire re-entrance) - all Phase 30 scope.	2026-05-05 10:47:49 -06:00
Ryan Malloy	6afdbcabb3	Phase 27: Wire lock + async cancellation eviction (2026.05.05.1) Closes Hamilton audit Critical #2 (concurrency / wire lock) and High #3 (async cancellation evicts cleanly). Phase 26 fixed what gets returned to the pool; Phase 27 fixes what can interleave on the wire while it's running. What changed: connections.py: * Added Connection._wire_lock = threading.RLock(). Wrapped commit(), rollback(), fast_path_call() under the lock. * _ensure_transaction documents the lock as a precondition AND asserts ownership at runtime (_wire_lock._is_owned()) so a future caller adding a third call site fails loudly. * close() tries to acquire wire lock with 0.5s timeout before SQ_EXIT; skips polite exit and force-closes if busy. cursors.py: * execute() body extracted into _execute_under_wire_lock() and called under the lock. * executemany() body wrapped inline. * _sfetch_at() wrapped - covers all scrollable fetch_* methods that delegate to it. * close() locks the CLOSE+RELEASE for scrollable cursors. pool.py: * release() acquires conn._wire_lock with 5s timeout before rollback. On timeout: log WARNING, evict connection. Constant _RELEASE_WIRE_LOCK_TIMEOUT for tunability. aio.py: * AsyncConnectionPool.connection() now catches CancelledError / TimeoutError separately and routes to broken=True. Combined with the wire lock, asyncio.wait_for around aio DB calls is now safe. * Updated docstring; mirrored in docs/USAGE.md. Margaret Hamilton review surfaced three actionable conditions, all addressed before tagging: * Cancellation test used contextlib.suppress - could pass without exercising the cancellation path on a fast runner. Switched to pytest.raises so the test fails if timeout doesn't fire. * _ensure_transaction precondition documented but unchecked at runtime. Added assert self._wire_lock._is_owned() guard. * Connection.close() was unsynchronized. Now tries 0.5s acquire before SQ_EXIT. Two new regression tests in tests/test_pool.py: * test_concurrent_threads_on_one_connection_dont_interleave_pdus (without lock: garbled results / hangs) * test_async_wait_for_cancellation_evicts_connection (asserts pool size shrinks; cancellation actually fires) 72 unit + 228 integration + 28 benchmark = 328 tests; ruff clean. Hamilton verdict: PRODUCTION READY WITH CAVEATS (was) -> CAVEATS NARROWED FURTHER (now). 0 critical, 2 high remaining (cursor finalizers + bare-except in error drain) - both Phase 28 scope.	2026-05-05 03:40:39 -06:00
Ryan Malloy	5c4a7a57f1	Phase 26: Pool rollback-on-release - CRITICAL data-correctness fix (2026.05.05) Fixes the dirty-pool-checkout bug surfaced by Margaret Hamilton's system-wide audit (Critical #1). The bug: ConnectionPool.release() returned connections with open server-side transactions still active. Request A's uncommitted INSERTs would be inherited by Request B reusing the same connection - B's commit would land A's writes permanently; B's rollback would silently lose them. Same shape as psycopg2's pre-2.5 dirty-pool bug. The fix: pool.release() now rolls back any open transaction before returning the connection to the idle list. The rollback runs OUTSIDE the pool lock since it's a wire round-trip - the connection is already off the idle list and counted in _total, so no other thread can grab it during the rollback window. If the rollback itself fails (dead socket, etc.), the connection is evicted rather than recycled. Async path covered automatically: AsyncConnectionPool.release() delegates to the sync pool's release via _to_thread. Margaret Hamilton review pass surfaced two findings, both addressed: * Silent rollback failure: added a WARNING log via logging.getLogger ("informix_db.pool") so evictions are debuggable. First logger in the project. * Async cancellation race: the fix doesn't introduce the asyncio.wait_for race (Critical #2, deferred to Phase 27), but it adds a code path that can trigger it. Documented loudly in pool.release() docstring, aio.py module docstring, and USAGE.md async section. Recommendation: use read_timeout on the connection instead of asyncio.wait_for until Phase 27 lands. Two new regression tests in tests/test_pool.py: * test_uncommitted_writes_invisible_to_next_acquirer (the bug) * test_committed_writes_survive_pool_checkout (no over-correction) Verified the regression test catches the bug: stashed the fix, ran the test - it fails with "B sees 1 rows - leaked across pool checkout boundary" - confirming it tests the real failure mode. Total tests: 72 unit + 226 integration + 28 benchmark = 326. Deferred to Phase 27 per Hamilton audit: * Critical #2 (concurrency / per-connection wire lock) * High #3 (async cancellation routes to broken=True) * High #4 (bare except in _raise_sq_err drain) * High #5 (no cursor finalizers - server-side resource leaks)	2026-05-05 03:22:18 -06:00
Ryan Malloy	5e26b34564	Phase 15: connection pool Thread-safe connection pool with min/max sizing, lazy growth, idle recycling, and per-acquire health-check. API: pool = informix_db.create_pool(host=..., min_size=1, max_size=10) with pool.connection() as conn: ... pool.close() Design choices: * Lazy growth from min_size — pre-opens min_size on construction, grows to max_size on demand. Pay-nothing startup with burst capacity. * Health-check on acquire, not release. Sends a trivial SELECT 1 round-trip before yielding. Dead idle connections (server-side timeout, network drop) are silently replaced. The cost is ~1ms per acquire, bought at the price of "users never see a stale- connection error". Check-on-release is wrong because idle time is when connections actually die. * Eviction on OperationalError/InterfaceError only. The "with pool.connection()" context manager retains the connection on application-level errors (ValueError, IntegrityError, etc.). Avoids the "every constraint violation evicts a healthy connection" pitfall. * Releases the pool lock during connect() — the slow handshake (50-100ms) doesn't serialize other threads' acquires. Tests: 15 integration tests in test_pool.py covering: * API & lifecycle (pre-open, lazy growth, context-manager, LIFO) * Exhaustion (timeout when full, per-acquire override, unblock-on-release) * Eviction (explicit broken, auto on OperationalError, retain on application errors) * Health-check (dead idle silently replaced) * Shutdown (close drains, idempotent, context-manager) * Multi-thread safety (8 workers × 3 queries each, no leaks) Total: 69 unit + 154 integration = 223 tests. With Phase 14 (TLS) and Phase 15 (pool), the project covers the three things a typical Python web/API workload needs from a database driver: PEP 249 surface, TLS transport, connection pool. Only async (informix_db.aio) remains in the backlog.	2026-05-04 14:50:27 -06:00

4 Commits