informix-db/README.md
Ryan Malloy 362ecb3d63 Phase 33: Pipelined executemany - 2.85x faster bulk insert (2026.05.05.6)
The serial-loop executemany paid one wire round-trip per row (~30us/
row on loopback). It was the one benchmark where IfxPy beat us in
the comparison work - 10% slower at executemany(1000) in txn.

Phase 33 pipelines the BIND+EXECUTE PDUs: build all N PDUs, send
them back-to-back, then drain all N responses. Eliminates per-row
RTT entirely.

Performance impact:
* executemany(1000) in txn:   31.3 ms -> 11.0 ms (2.85x faster)
* executemany(100) autocommit: 173 ms -> 154 ms (11% faster)
* executemany(1000) autocommit: 1740 ms -> 1590 ms (9% faster)

(Autocommit gets smaller wins because server-side log flushes
dominate - Phase 21.1's "autocommit cliff".)

IfxPy comparison flipped: us 10% slower -> us 2.05x faster on bulk
inserts. We now win all 5 head-to-head benchmarks against the C-bound
driver.

Margaret Hamilton review surfaced one CRITICAL concern (C1) - the
pipeline assumes Informix sends N responses for N pipelined PDUs
even when one fails. If the server cut the stream short, the drain
loop would deadlock on the next read.

Verified by 3 new integration tests in tests/test_executemany_pipeline.py:
* test_pipelined_executemany_mid_batch_constraint_violation (row 500/1000)
* test_pipelined_executemany_first_row_fails (row 0/100)
* test_pipelined_executemany_last_row_fails (row 99/100)

All confirm Informix sends N responses; wire stays aligned; connection
is usable after.

Plus 4 lower-priority fixes Hamilton recommended:
* H1: documented _raise_sq_err self-drains-SQ_EOT invariant + tripwire
* H2: docstring warning about O(N) lock duration; chunk for huge batches
* M1: prepend row-index to exception message rather than reformat
* M2: documented sendall-no-timeout caveat on hostile networks

77 unit + 239 integration + 33 benchmark = 349 tests; ruff clean.

Note: Phase 32 (Tier 1+2 benchmarks) was tagged without bumping
pyproject.toml's version string. .5 was git-tag-only; .6 is the next
published version increment.
2026-05-05 12:26:15 -06:00

238 lines
11 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# informix-db
Pure-Python driver for IBM Informix IDS, speaking the SQLI wire protocol over raw sockets. **No IBM Client SDK. No JVM. No native libraries.** PEP 249 compliant; sync + async APIs; built-in connection pool; TLS support.
To our knowledge this is the **first pure-socket Informix driver in any language** — every other Informix driver (`IfxPy`, the legacy `informixdb`, ODBC bridges, JPype/JDBC, Perl `DBD::Informix`) wraps either IBM's CSDK or the JDBC JAR.
```bash
pip install informix-db
```
Requires Python ≥ 3.10.
## Status
**Production ready.** Every finding from a system-wide failure-mode audit (data correctness, wire safety, resource leaks, concurrency, async cancellation) has been addressed:
| Severity | Finding | Status |
|---|---|---|
| Critical | Pool returns connections with open transactions | Fixed (Phase 26) |
| Critical | Unsynchronized wire path → PDU interleaving | Fixed (Phase 27) — per-connection wire lock |
| High | Async cancellation leaks running workers onto recycled connections | Fixed (Phase 27) |
| High | `_raise_sq_err` bare-except masks wire desync | Fixed (Phase 28) |
| High | Cursor finalizers — server-side resources leak on mid-fetch raise | Fixed (Phase 28+29) |
| Medium | 5 hardening items | Fixed (Phase 28+30) |
**0 critical, 0 high, 0 medium audit findings remain.** Every architectural change went through a Margaret Hamilton-style review focused on silent-failure modes, recovery paths, and documented invariants. Each documented invariant is paired with either a runtime guard or a CI tripwire test.
**Test coverage:** 300+ tests across unit / integration / benchmark suites. Integration tests run against the official IBM Informix Developer Edition Docker image (15.0.1.0.3DE).
## Quick start
```python
import informix_db
with informix_db.connect(
host="db.example.com", port=9088,
user="informix", password="...",
database="mydb", server="informix",
) as conn:
cur = conn.cursor()
cur.execute("SELECT id, name FROM users WHERE id = ?", (42,))
user_id, name = cur.fetchone()
```
## Async (FastAPI / aiohttp / asyncio)
```python
import asyncio
from informix_db import aio
async def main():
pool = await aio.create_pool(
host="db.example.com", user="informix", password="...",
database="mydb",
min_size=1, max_size=10,
)
async with pool.connection() as conn:
cur = await conn.cursor()
await cur.execute("SELECT id, name FROM users WHERE id = ?", (42,))
row = await cur.fetchone()
await pool.close()
asyncio.run(main())
```
## Connection pool (sync)
```python
import informix_db
pool = informix_db.create_pool(
host="db.example.com", user="informix", password="...",
database="mydb",
min_size=1, max_size=10, acquire_timeout=5.0,
)
with pool.connection() as conn:
cur = conn.cursor()
cur.execute("...")
pool.close()
```
## TLS
```python
import ssl
# Production: bring your own context
ctx = ssl.create_default_context(cafile="/path/to/ca.pem")
informix_db.connect(host="...", port=9089, ..., tls=ctx)
# Dev / self-signed: tls=True disables verification
informix_db.connect(host="127.0.0.1", port=9089, ..., tls=True)
```
Informix uses dedicated TLS-enabled listener ports (configured server-side in `sqlhosts`) rather than STARTTLS upgrade — point `port` at the TLS listener (often `9089`) when `tls` is enabled.
## Type support
| SQL type | Python type |
|---|---|
| `SMALLINT` / `INT` / `BIGINT` / `SERIAL` | `int` |
| `FLOAT` / `SMALLFLOAT` | `float` |
| `DECIMAL(p,s)` / `MONEY` | `decimal.Decimal` |
| `CHAR` / `VARCHAR` / `NCHAR` / `NVCHAR` / `LVARCHAR` | `str` |
| `BOOLEAN` | `bool` |
| `DATE` | `datetime.date` |
| `DATETIME YEAR TO ...` | `datetime.datetime` / `datetime.time` / `datetime.date` |
| `INTERVAL DAY TO FRACTION` | `datetime.timedelta` |
| `INTERVAL YEAR TO MONTH` | `informix_db.IntervalYM` |
| `BYTE` / `TEXT` (legacy in-row blobs) | `bytes` / `str` |
| `BLOB` / `CLOB` (smart-LOBs) | `informix_db.BlobLocator` / `informix_db.ClobLocator` (read via `cursor.read_blob_column`, write via `cursor.write_blob_column`) |
| `ROW(...)` | `informix_db.RowValue` |
| `SET(...)` / `MULTISET(...)` / `LIST(...)` | `informix_db.CollectionValue` |
| `NULL` | `None` |
## Smart-LOB (BLOB / CLOB) read & write
```python
# Read: returns the actual bytes
data = cur.read_blob_column(
"SELECT data FROM photos WHERE id = ?", (42,)
)
# Write: BLOB_PLACEHOLDER token marks where the BLOB goes
cur.write_blob_column(
"INSERT INTO photos VALUES (?, BLOB_PLACEHOLDER)",
blob_data=jpeg_bytes,
params=(42,),
)
```
Both work end-to-end in pure Python via the `lotofile` / `filetoblob` server functions intercepted at the `SQ_FILE` (98) wire-protocol level — no native machinery anywhere in the thread of execution. See [`docs/DECISION_LOG.md`](docs/DECISION_LOG.md) §1011 for the architecture pivot that made this possible.
## Direct stored-procedure invocation (fast-path)
```python
# Cleanly close a smart-LOB descriptor opened via SQL
result = conn.fast_path_call(
"function informix.ifx_lo_close(integer)", lofd
)
# result == [0] on success
```
The fast-path RPC (`SQ_FPROUTINE` / `SQ_EXFPROUTINE`) bypasses PREPARE → EXECUTE → FETCH for direct UDF/SPL calls. Routine handles are cached per-connection, so repeated calls to the same function take a single round-trip.
## Server compatibility
Tested against IBM Informix Dynamic Server **15.0.1.0.3DE** (the official `icr.io/informix/informix-developer-database` Docker image). The wire protocol is stable across modern Informix versions; should work against 12.10+ unmodified.
For features that need server-side configuration (smart-LOBs, logged transactions), see [`docs/DECISION_LOG.md`](docs/DECISION_LOG.md):
- Phase 7 — logged-DB transactions
- Phase 8 — BYTE/TEXT (needs blobspace)
- Phase 10/11 — BLOB/CLOB (needs sbspace + `SBSPACENAME` config + level-0 archive)
## Performance
Single-connection benchmarks against the dev container on loopback:
| Operation | Mean | Throughput |
|---|---:|---:|
| `decode(int)` per cell | 139 ns | 7.2M ops/sec |
| `parse_tuple_payload` per row (5 cols) | 1.4 µs | 715K rows/sec |
| `SELECT 1` round-trip | ~140 µs | ~7K queries/sec |
| 1000-row SELECT | ~1.0 ms | ~990K rows/sec sustained |
| `executemany(1000)` in transaction | 32 ms | **~31,000 rows/sec** |
| Pool acquire + query + release | 295 µs | ~3.4K queries/sec |
| Cold connect (login handshake) | 11 ms | ~90 connections/sec |
**Performance gotcha**: `executemany(...)` under `autocommit=True` is **53× slower** than the same call inside a single transaction (server flushes the transaction log per row). For bulk loads, `autocommit=False` (default) + `conn.commit()` at the end. See [`docs/USAGE.md`](docs/USAGE.md) for the full performance tips section.
### Compared to IfxPy (the C-bound PyPI driver)
Head-to-head benchmarks against [IfxPy](https://pypi.org/project/IfxPy/) on identical workloads, same Informix server, matched conditions. Using **median + IQR over 10+ rounds** to resist outlier-round noise:
| Benchmark | IfxPy 3.0.5 (C-bound) | `informix-db` (pure Python) | Result |
|---|---:|---:|---:|
| Single-row SELECT round-trip | 118 µs | **114 µs** | **`informix-db` 3% faster** |
| ~10-row server-side query | 164 µs | **159 µs** | **`informix-db` 3% faster** |
| 1000-row SELECT (full fetch) | 984 µs | **891 µs** | **`informix-db` 9% faster** |
| **`executemany(1000)` in transaction** | 21.4 ms | **10.4 ms** | **`informix-db` 2.05× faster** |
| Cold connect (login handshake) | 11.0 ms | **10.4 ms** | **`informix-db` 5% faster** |
**`informix-db` wins on all 5 benchmarks against the C-bound driver, including a 2× win on bulk inserts.**
**Why pure-Python wins the round-trip-bound work:** IfxPy's code path is `Python → OneDB ODBC driver → libifdmr.so → wire`. Ours is `Python → wire`. The abstraction-layer overhead IfxPy carries on every call costs more than the C-vs-Python codec gap saves.
**Why we win bulk inserts dramatically:** `executemany` pipelines all N BIND+EXECUTE PDUs to the wire before draining responses (Phase 33), eliminating the per-row round-trip that the older serial loop incurred. IfxPy still does one synchronous round-trip per row.
Full methodology, IQR caveats, install gauntlet, and reproduction in [`tests/benchmarks/compare/README.md`](tests/benchmarks/compare/README.md).
A note on IfxPy's install gauntlet: getting it to run on a modern system requires Python ≤ 3.11, setuptools <58, permissive CFLAGS, manual download of a 92 MB ODBC tarball, four `LD_LIBRARY_PATH` directories, and `libcrypt.so.1` (deprecated 2018, missing on Arch / Fedora 35+ / RHEL 9). `informix-db`'s install: `pip install informix-db`.
## Standards & guarantees
* **PEP 249** (DB-API 2.0): `connect()`, `Connection`, `Cursor`, `description`, `rowcount`, exception hierarchy
* **`paramstyle = "numeric"`** (Informix's native ESQL/C convention; `?` and `:1` both work)
* **Threadsafety = 1**: threads may share the module but not connections; the pool gives per-thread connection access. Phase 27 added a per-connection wire lock that makes accidental sharing safe (interleaved PDUs serialize correctly), but PEP 249 advice still holds give each thread its own connection.
* **CalVer versioning**: `YYYY.MM.DD` releases. PEP 440 post-releases (`.1`, `.2`) for same-day fixes.
## Development
The full test + lint workflow is in the [Makefile](Makefile). Quick summary:
```bash
make test # 77 unit tests (no Docker)
make ifx-up && make test-integration # 231 integration tests
make bench # benchmark suite
make lint # ruff
```
For the smart-LOB tests specifically, the dev container needs additional one-time setup (blobspace + sbspace + level-0 archive). See [`docs/DECISION_LOG.md`](docs/DECISION_LOG.md) §10 for the `onspaces` / `onmode` / `ontape` commands.
## Documentation
- [**`docs/USAGE.md`**](docs/USAGE.md) practical recipes: connections, parameter binding, type mapping, transactions, performance tips, scrollable cursors, BLOBs, async, TLS, locale/Unicode, error handling, known limitations
- [`tests/benchmarks/README.md`](tests/benchmarks/README.md) performance baselines, headline numbers, how to run regressions
- `CHANGELOG.md` phase-by-phase release notes
## Project history & design rationale
This driver was built incrementally across 30 phases, each with a focused scope and decision log. The reasoning trail lives in:
- [`docs/PROTOCOL_NOTES.md`](docs/PROTOCOL_NOTES.md) byte-level SQLI wire-format reference
- [`docs/JDBC_NOTES.md`](docs/JDBC_NOTES.md) index into the decompiled IBM JDBC driver, used as a clean-room reference
- [`docs/DECISION_LOG.md`](docs/DECISION_LOG.md) phase-by-phase architectural decisions, with the *why* preserved
- [`docs/CAPTURES/`](docs/CAPTURES/) annotated socat hex-dump captures
Notable architectural pivots documented in the decision log:
- **Phase 10/11** (smart-LOB read/write): used `lotofile`/`filetoblob` SQL functions + `SQ_FILE` protocol intercept instead of the heavier `SQ_FPROUTINE` + `SQ_LODATA` stack ~3x smaller than originally projected
- **Phase 7** (logged-DB transactions): discovered Informix requires explicit `SQ_BEGIN` before each transaction in non-ANSI mode, plus `SQ_RBWORK` needs a savepoint short payload
- **Phase 16** (async): shipped thread-pool wrapping (~250 lines) instead of full I/O abstraction refactor (~2000 lines); functionally equivalent for typical FastAPI workloads
## License
MIT.