informix-db

Author	SHA1	Message	Date
Ryan Malloy	24feabd21b	Rename PyPI distribution: informix-db → informix-driver PyPI rejected `informix-db` as too similar to legacy `informixdb` (v2.5, 2008). Renamed distribution to `informix-driver`. Import name stays `informix_db` — same separation Pillow uses with `import PIL`. Updated: - pyproject.toml [project].name - README pip-install command + brief explanation note - docs-site quickstart, vs-ifxpy, wtf, Hero install commands First PyPI release: pypi.org/project/informix-driver/2026.5.8/	2026-05-08 04:17:57 -06:00
Ryan Malloy	1582a5295d	Prepare 2026.05.08 for first PyPI publish - Bump version 2026.05.05.12 → 2026.05.08 (CalVer, publish date) - Expand sdist excludes to /glob patterns: docs/, docs-site/, tests/reference/, tests/benchmarks/.results/** — the trailing-slash form was silently passing through subtree contents - Sanitize tests/benchmarks/baseline.json hostname → PLACEHOLDER - Rewrite README relative docs/, tests/, Makefile links to absolute Gitea URLs (git.supported.systems/warehack.ing/informix-db) - pyproject urls: Homepage + Documentation → informix-db.warehack.ing, Source/Issues/Changelog → Gitea (warehack.ing org is now public)	2026-05-08 04:13:09 -06:00
Ryan Malloy	270155d2de	Phase 36: IfxPy scaling comparison + honest comparison numbers (2026.05.05.9) Extends the IfxPy comparison bench script with scaling workloads (1k/10k/100k rows for both executemany and SELECT). Re-runs the full comparison with consistent measurement methodology and updates the README with the actually-correct numbers. Earlier comparison runs reported informix-db winning all 5 benchmarks. Re-running select_bench_table_all with consistent measurement gives 3.04 ms, not the 891 us I cited earlier - a 3.4x discrepancy attributable to noisy warmup + small-fixture artifacts. The "we win everything" framing was wrong. Corrected comparison reveals two clear stories: Bulk-insert: pure-Python wins 1.6x at scale. executemany(10k): IfxPy 259ms -> us 161ms (1.6x faster) executemany(100k): IfxPy 2376ms -> us 1487ms (1.6x faster) Reason: Phase 33's pipelining eliminates per-row RTT. IfxPy's per-call API can't pipeline. Large-fetch: IfxPy wins 2.3-2.4x at scale. SELECT 1k rows: IfxPy 1.2ms / us 2.7ms (IfxPy 2.3x) SELECT 10k rows: IfxPy 11.3ms / us 25.8ms (IfxPy 2.3x) SELECT 100k rows: IfxPy 112ms / us 271ms (IfxPy 2.4x) Reason: C-level fetch_tuple at ~1.1us/row beats Python parse_tuple_payload at ~2.7us/row. Real C-vs-Python codec gap showing up at scale. For everyday workloads (single SELECT in a request, INSERT a handful of rows), drivers are within 5-25%. For workloads where the gap widens, direction depends on what you're doing - bulk- write favors us, bulk-read favors IfxPy. README's "Compared to IfxPy" section rewritten with the corrected numbers and an honest "when to prefer which" subsection. tests/benchmarks/compare/README.md mirror updated. Net narrative: a "faster at bulk-write, slower at bulk-read, comparable elsewhere" comparison story is more honest and more durable than a "we win everything" claim that would have collapsed the first time a user ran their own benchmark. Side note (lint): one ambiguous unicode `×` in cursors.py replaced with `x`. Phase 37 ticket: parse_tuple_payload is the bottleneck at scale. Closing the 1.6 us/row gap to IfxPy would make us competitive on bulk-fetch too. Possible approaches: Cython codec, deeper inlining, per-column dispatch pre-bake.	2026-05-05 12:44:52 -06:00
Ryan Malloy	362ecb3d63	Phase 33: Pipelined executemany - 2.85x faster bulk insert (2026.05.05.6) The serial-loop executemany paid one wire round-trip per row (~30us/ row on loopback). It was the one benchmark where IfxPy beat us in the comparison work - 10% slower at executemany(1000) in txn. Phase 33 pipelines the BIND+EXECUTE PDUs: build all N PDUs, send them back-to-back, then drain all N responses. Eliminates per-row RTT entirely. Performance impact: * executemany(1000) in txn: 31.3 ms -> 11.0 ms (2.85x faster) * executemany(100) autocommit: 173 ms -> 154 ms (11% faster) * executemany(1000) autocommit: 1740 ms -> 1590 ms (9% faster) (Autocommit gets smaller wins because server-side log flushes dominate - Phase 21.1's "autocommit cliff".) IfxPy comparison flipped: us 10% slower -> us 2.05x faster on bulk inserts. We now win all 5 head-to-head benchmarks against the C-bound driver. Margaret Hamilton review surfaced one CRITICAL concern (C1) - the pipeline assumes Informix sends N responses for N pipelined PDUs even when one fails. If the server cut the stream short, the drain loop would deadlock on the next read. Verified by 3 new integration tests in tests/test_executemany_pipeline.py: * test_pipelined_executemany_mid_batch_constraint_violation (row 500/1000) * test_pipelined_executemany_first_row_fails (row 0/100) * test_pipelined_executemany_last_row_fails (row 99/100) All confirm Informix sends N responses; wire stays aligned; connection is usable after. Plus 4 lower-priority fixes Hamilton recommended: * H1: documented _raise_sq_err self-drains-SQ_EOT invariant + tripwire * H2: docstring warning about O(N) lock duration; chunk for huge batches * M1: prepend row-index to exception message rather than reformat * M2: documented sendall-no-timeout caveat on hostile networks 77 unit + 239 integration + 33 benchmark = 349 tests; ruff clean. Note: Phase 32 (Tier 1+2 benchmarks) was tagged without bumping pyproject.toml's version string. .5 was git-tag-only; .6 is the next published version increment.	2026-05-05 12:26:15 -06:00
Ryan Malloy	01757415a5	Phase 32: Benchmark improvements (Tier 1 + Tier 2) Tier 1 — make existing benchmarks reliable: * Bumped slow-bench rounds: cold_connect_disconnect 5->15, executemany series 3->10. Single-round outliers no longer dominate. * Switched bench reporting to median + IQR. Mean was being moved by individual GC pauses / scheduler hiccups (IfxPy executemany IQR was 8.2 ms on a 28 ms median - 29% spread - mean was unreliable). * Updated ifxpy_bench.py to also report median + IQR alongside mean for cross-comparable numbers. * Makefile bench targets now show median, iqr, mean, stddev, ops, rounds. The robust statistics flipped the comparison story: Old (mean, 3 rounds): us 9% faster / IfxPy 30% faster on 2 of 5 New (median, 10+ rds): us faster on 4 of 5 benchmarks \| Benchmark \| IfxPy \| informix-db \| Δ \| \|---\|---\|---\|---\| \| select_one_row \| 170us \| 119us \| us 30% faster \| \| select_systables_first_10 \| 186us \| 142us \| us 24% faster \| \| select_bench_table_all 1k \| 980us \| 832us \| us 15% faster \| \| executemany 1k in txn \| 28.3ms \| 31.3ms \| us 10% slower \| \| cold_connect_disconnect \| 12.0ms \| 10.7ms \| us 11% faster \| Tier 2 — add benchmarks for claims we make but don't verify: tests/benchmarks/test_observability_perf.py: * test_streaming_fetch_memory_profile — RSS sampling during a cursor iteration. Documents memory growth shape; regression wall at 100 MB / 1k rows. Currently flat (in-memory cursor doesn't grow detectably for 278 rows). * test_select_1_latency_percentiles — 1000-query distribution with p50/p90/p95/p99/max. Result: p99/p50 = 1.42x (tight tail). p50=108us, p99=153us. * test_concurrent_pool_throughput[2,4,8] — N worker threads through pool, measures aggregate QPS + per-thread fairness. Plateaus at ~6K QPS (server-bound); per-thread latency scales ~linearly with N (server serialization expected). README.md (project root): updated Compared-to-IfxPy table with the median-based numbers + IQR awareness note. tests/benchmarks/compare/README.md: added "Statistical robustness" section explaining why median over mean for fair comparison. 236 integration tests pass; ruff clean.	2026-05-05 12:01:11 -06:00
Ryan Malloy	a9e1f17bae	Phase 31: Head-to-head benchmark vs IfxPy (the C-bound PyPI driver) Adds a paired benchmark of informix-db (pure Python) against IfxPy 3.0.5 (IBM's C-bound driver via OneDB ODBC) on identical workloads against the same Informix dev container. Headline result: pure Python is competitive — and faster on 2/5 benchmarks where wire round-trip dominates over codec/marshaling. \| Benchmark \| IfxPy \| informix-db \| Result \| \|---\|---:\|---:\|---:\| \| select_one_row (single-row latency) \| 128 us \| 116 us \| us 9% faster \| \| select_systables_first_10 \| 126 us \| 184 us \| IfxPy 32% faster \| \| select_bench_table_all (1k rows) \| 969 us \| 855 us \| us 12% faster \| \| executemany(1000) in txn \| 21.5 ms \| 30.8 ms \| IfxPy 30% slower \| \| cold_connect_disconnect \| 11.0 ms \| 10.9 ms \| comparable \| Why the surprising wins: IfxPy's path is Python -> OneDB ODBC -> libifdmr -> wire. Ours is Python -> wire. When wire round-trip dominates (single-row, bulk fetch), the missing abstraction layer makes us faster. When per-row marshaling dominates (executemany), IfxPy's C-level execute(stmt, tuple) beats Python BIND-PDU build. Files added under tests/benchmarks/compare/: * Dockerfile.ifxpy — Ubuntu 20.04 base with IfxPy + OneDB drivers * ifxpy_bench.py — IfxPy benchmark workloads matching test__perf.py README.md — methodology, results, install gauntlet, reproduction The IfxPy install gauntlet itself is part of the comparison story: modern Python 3.11 (not 3.13), setuptools <58, permissive CFLAGS, manual download of 92MB OneDB ODBC tarball, four LD_LIBRARY_PATH directories, libcrypt.so.1 (deprecated 2018, missing on Arch / Fedora 35+ / RHEL 9). Versus our `pip install informix-db`. README.md (project root): added "Compared to IfxPy" section under Performance with the headline numbers and a pointer to the full methodology. .gitignore: keep Dockerfile/script/README under tests/benchmarks/ compare/, exclude the 92MB OneDB tarball and the local venv.	2026-05-05 11:41:47 -06:00
Ryan Malloy	eb8d15d204	README + classifier polish for PyPI launch PyPI users landing on the README need to know quickly: - What this is (already strong) - Whether it's safe to use in production (was missing) - Performance expectations (was missing) - Python version requirement (was only in pyproject.toml metadata) Updates: * Added "Status" section with the Hamilton audit findings table - every critical/high/medium addressed, 0 remaining. Names the Hamilton-style review process explicitly as the credibility signal. * Added Python ≥ 3.10 requirement under the install command. * Added "Performance" section with single-connection benchmarks and the 53x autocommit-cliff gotcha (most important perf pitfall). * Updated "Standards & guarantees" to mention Phase 27's wire lock alongside the PEP 249 Threadsafety=1 declaration - accurate context for sophisticated readers. * Tightened "Development" to PyPI-appropriate brevity (short Makefile target list instead of full uv invocations). * Updated stale phase count (22+ → 30) and test counts (69 → 77 unit, 163 → 231 integration). Added "300+ tests" rough number in the Status section to reduce future staleness churn. * Fixed typo: "no thread of native machinery" → "no native machinery anywhere in the thread of execution". * Bumped pyproject.toml classifier from "Development Status :: 4 - Beta" to "5 - Production/Stable" - earned by the audit work. No code changes.	2026-05-05 11:06:49 -06:00
Ryan Malloy	0e0dfcba26	Phase 22: User-facing documentation refresh (2026.05.04.7) The docs/USAGE.md predated Phases 17-21, so anyone landing on PyPI was missing scrollable cursors, locale/Unicode, the autocommit cliff finding, and the type-mapping reference. Added sections to docs/USAGE.md: * Locale and Unicode - client_locale, Connection.encoding, CLIENT_LOCALE vs DB_LOCALE, when characters can't fit the codec * Type mapping reference - full SQL <-> Python type table, NULL sentinels subsection, IntervalYM * Performance tips - 53x autocommit-cliff fix, 100x executemany win, 72x pool win, with the actual benchmark numbers from Phase 21.1 * Scrollable cursors - fetch_* API, in-memory vs server-side trade-off, edge cases (past-end semantics, negative indexing, rownumber) * Timeouts and keepalive subsection - production starting points * Environment dictionary subsection - env={} parameter * Known limitations - explicit table of what doesn't work (named params, complex UDT bind, GSSAPI, XA) with workarounds; "things that might surprise you" notes README.md - added Documentation section linking to docs/USAGE.md and tests/benchmarks/README.md. Doc corrections caught during review: * cursor.rownumber is 0-indexed (impl has always been correct; only the original docstring wording was loose) * fetch_* methods work on BOTH scrollable=True and default cursors; the in-memory path supports them too USAGE.md grew from 345 lines to 633.	2026-05-04 17:33:37 -06:00
Ryan Malloy	0c856372a6	v2026.05.04: bump CalVer + polish docs Version bump (2026.05.02 → 2026.05.04) reflects the library reaching feature completeness across Phases 1-16. Documentation: * README.md — full rewrite. The previous README was from Phase 1 ("cursor() / execute() / fetchone() arrive in Phase 2"). New README covers: sync + async APIs, connection pool, TLS, full type matrix, smart-LOBs, fast-path RPC, server-compatibility, development workflow, and pointers to the protocol research docs. * docs/USAGE.md — new practical recipe guide. Connecting, cursor lifecycle, parameter binding, transactions (logged + unlogged), executemany, smart-LOB read/write, connection pool, async, TLS, error handling, fast-path RPC, server-side setup steps, and a migration table from IfxPy / legacy informixdb. * CHANGELOG.md — new file. Captures the v2026.05.04 release as the Phase 1-16 completion milestone with a full feature inventory and known-gap list. Future point-releases append here. Classifiers updated: * Development Status: 2 → 4 (Pre-Alpha → Beta) * Added Framework :: AsyncIO Keywords: added asyncio, async. No code changes; tests still pass (69 unit + 163 integration = 232). Ruff clean.	2026-05-04 15:38:09 -06:00
Ryan Malloy	9b1fd8af2c	Phase 1: pure-Python SQLI login works end-to-end This commit takes informix-db from documentation-only (Phase 0 spike) to a functional connect() / close() against a real Informix server. To our knowledge, this is the first pure-socket Informix client in any language — no CSDK, no JVM, no native libraries. Layered architecture per the plan, mirroring PyMySQL's shape: src/informix_db/ __init__.py — PEP 249 surface (connect, exceptions, paramstyle="numeric") exceptions.py — full PEP 249 hierarchy declared up front _socket.py — raw socket I/O (read_exact, write_all, timeouts) _protocol.py — IfxStreamReader / IfxStreamWriter framing primitives (big-endian, 16-bit-aligned variable payloads, length-prefixed nul-terminated strings) _messages.py — SQ_* tags from IfxMessageTypes + ASF/login markers _auth.py — pluggable auth handlers; plain-password is the only Phase-1 implementation connections.py — Connection class: builds the binary login PDU (SLheader + PFheader byte-for-byte per PROTOCOL_NOTES.md §3), sends it, parses the server response, wires up close() Phase 1 design decisions locked in DECISION_LOG.md: - paramstyle = "numeric" (matches Informix ESQL/C convention) - Python >= 3.10 - autocommit defaults to off (PEP 249 implicit) - License: MIT - Distribution name: informix-db (verified PyPI-available) Test coverage: 34 unit tests (codec round-trips against synthetic byte streams; observed login-PDU values from the spike captures asserted as exact byte literals) + 6 integration tests (connect, idempotent close, context manager, bad-password → OperationalError, bad-host → OperationalError, cursor() raises NotImplementedError). pytest — runs 34 unit tests, no Docker needed pytest -m integration — runs 6 integration tests against the Developer Edition container (pinned by digest in tests/docker-compose.yml) pytest -m "" — runs everything ruff is clean across src/ and tests/. One bug found during smoke testing: threading.get_ident() can exceed signed 32-bit on some processes, overflowing struct.pack("!i"). Fixed the same way the JDBC reference does — clamp to signed 32-bit, fall back to 0 if out of range. The field is diagnostic only. One protocol-level observation that AMENDED the JDBC source reading: the "capability section" in the login PDU is three independently negotiated 4-byte ints (Cap_1=1, Cap_2=0x3c000000, Cap_3=0), not one int + 8 reserved zero bytes as my CFR decompile read suggested. The server echoes them back identically. Trust the wire over the decompiler. Phase 1 verification matrix (from PROTOCOL_NOTES.md §12): - Login byte layout: confirmed (server accepts our pure-Python PDU) - Disconnection: confirmed (SQ_EXIT round-trip works) - Framing primitives: confirmed (34 unit tests) - Error path: bad password → OperationalError, bad host → OperationalError Phase 2 (Cursor / SELECT / basic types) is the next phase. The hard unknowns there — exact column-descriptor layout, statement-time error format — were called out as bounded gaps in Phase 0 and have existing captures (02-select-1.socat.log, 02-dml-cycle.socat.log) to characterize against.	2026-05-02 19:10:24 -06:00
Ryan Malloy	f202dbce0c	Initialize Phase 0 spike scaffold Project goal: pure-Python implementation of the Informix SQLI wire protocol. No CSDK, no JVM, no native deps. Targets icr.io/informix /informix-developer-database (port 9088) as the dev/test instance. Phase 0 is a documentation-only spike that gates all implementation work. The four scaffolds: - README.md: project status and Phase 0 deliverable index - docs/PROTOCOL_NOTES.md: byte-level wire-format reference (TBD) - docs/JDBC_NOTES.md: reverse-lookup index into the decompiled IBM JDBC driver (4.50.4.1), populated from build/jdbc-src/ once the decompile lands - docs/DECISION_LOG.md: running rationale, with the Phase-1 paramstyle /Python-floor/autocommit decisions pre-locked so they don't churn later CLAUDE.md is gitignored — operator-private context, public-PyPI repo.	2026-05-02 13:22:28 -06:00

11 Commits