informix-db/README.md
Ryan Malloy a9e1f17bae Phase 31: Head-to-head benchmark vs IfxPy (the C-bound PyPI driver)
Adds a paired benchmark of informix-db (pure Python) against IfxPy
3.0.5 (IBM's C-bound driver via OneDB ODBC) on identical workloads
against the same Informix dev container.

Headline result: pure Python is competitive — and faster on 2/5
benchmarks where wire round-trip dominates over codec/marshaling.

| Benchmark | IfxPy | informix-db | Result |
|---|---:|---:|---:|
| select_one_row (single-row latency) | 128 us | 116 us | us 9% faster |
| select_systables_first_10 | 126 us | 184 us | IfxPy 32% faster |
| select_bench_table_all (1k rows) | 969 us | 855 us | us 12% faster |
| executemany(1000) in txn | 21.5 ms | 30.8 ms | IfxPy 30% slower |
| cold_connect_disconnect | 11.0 ms | 10.9 ms | comparable |

Why the surprising wins: IfxPy's path is Python -> OneDB ODBC ->
libifdmr -> wire. Ours is Python -> wire. When wire round-trip
dominates (single-row, bulk fetch), the missing abstraction layer
makes us faster. When per-row marshaling dominates (executemany),
IfxPy's C-level execute(stmt, tuple) beats Python BIND-PDU build.

Files added under tests/benchmarks/compare/:
* Dockerfile.ifxpy — Ubuntu 20.04 base with IfxPy + OneDB drivers
* ifxpy_bench.py — IfxPy benchmark workloads matching test_*_perf.py
* README.md — methodology, results, install gauntlet, reproduction

The IfxPy install gauntlet itself is part of the comparison story:
modern Python 3.11 (not 3.13), setuptools <58, permissive CFLAGS,
manual download of 92MB OneDB ODBC tarball, four LD_LIBRARY_PATH
directories, libcrypt.so.1 (deprecated 2018, missing on Arch /
Fedora 35+ / RHEL 9). Versus our `pip install informix-db`.

README.md (project root): added "Compared to IfxPy" section under
Performance with the headline numbers and a pointer to the full
methodology.

.gitignore: keep Dockerfile/script/README under tests/benchmarks/
compare/, exclude the 92MB OneDB tarball and the local venv.
2026-05-05 11:41:47 -06:00

233 lines
11 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# informix-db
Pure-Python driver for IBM Informix IDS, speaking the SQLI wire protocol over raw sockets. **No IBM Client SDK. No JVM. No native libraries.** PEP 249 compliant; sync + async APIs; built-in connection pool; TLS support.
To our knowledge this is the **first pure-socket Informix driver in any language** — every other Informix driver (`IfxPy`, the legacy `informixdb`, ODBC bridges, JPype/JDBC, Perl `DBD::Informix`) wraps either IBM's CSDK or the JDBC JAR.
```bash
pip install informix-db
```
Requires Python ≥ 3.10.
## Status
**Production ready.** Every finding from a system-wide failure-mode audit (data correctness, wire safety, resource leaks, concurrency, async cancellation) has been addressed:
| Severity | Finding | Status |
|---|---|---|
| Critical | Pool returns connections with open transactions | Fixed (Phase 26) |
| Critical | Unsynchronized wire path → PDU interleaving | Fixed (Phase 27) — per-connection wire lock |
| High | Async cancellation leaks running workers onto recycled connections | Fixed (Phase 27) |
| High | `_raise_sq_err` bare-except masks wire desync | Fixed (Phase 28) |
| High | Cursor finalizers — server-side resources leak on mid-fetch raise | Fixed (Phase 28+29) |
| Medium | 5 hardening items | Fixed (Phase 28+30) |
**0 critical, 0 high, 0 medium audit findings remain.** Every architectural change went through a Margaret Hamilton-style review focused on silent-failure modes, recovery paths, and documented invariants. Each documented invariant is paired with either a runtime guard or a CI tripwire test.
**Test coverage:** 300+ tests across unit / integration / benchmark suites. Integration tests run against the official IBM Informix Developer Edition Docker image (15.0.1.0.3DE).
## Quick start
```python
import informix_db
with informix_db.connect(
host="db.example.com", port=9088,
user="informix", password="...",
database="mydb", server="informix",
) as conn:
cur = conn.cursor()
cur.execute("SELECT id, name FROM users WHERE id = ?", (42,))
user_id, name = cur.fetchone()
```
## Async (FastAPI / aiohttp / asyncio)
```python
import asyncio
from informix_db import aio
async def main():
pool = await aio.create_pool(
host="db.example.com", user="informix", password="...",
database="mydb",
min_size=1, max_size=10,
)
async with pool.connection() as conn:
cur = await conn.cursor()
await cur.execute("SELECT id, name FROM users WHERE id = ?", (42,))
row = await cur.fetchone()
await pool.close()
asyncio.run(main())
```
## Connection pool (sync)
```python
import informix_db
pool = informix_db.create_pool(
host="db.example.com", user="informix", password="...",
database="mydb",
min_size=1, max_size=10, acquire_timeout=5.0,
)
with pool.connection() as conn:
cur = conn.cursor()
cur.execute("...")
pool.close()
```
## TLS
```python
import ssl
# Production: bring your own context
ctx = ssl.create_default_context(cafile="/path/to/ca.pem")
informix_db.connect(host="...", port=9089, ..., tls=ctx)
# Dev / self-signed: tls=True disables verification
informix_db.connect(host="127.0.0.1", port=9089, ..., tls=True)
```
Informix uses dedicated TLS-enabled listener ports (configured server-side in `sqlhosts`) rather than STARTTLS upgrade — point `port` at the TLS listener (often `9089`) when `tls` is enabled.
## Type support
| SQL type | Python type |
|---|---|
| `SMALLINT` / `INT` / `BIGINT` / `SERIAL` | `int` |
| `FLOAT` / `SMALLFLOAT` | `float` |
| `DECIMAL(p,s)` / `MONEY` | `decimal.Decimal` |
| `CHAR` / `VARCHAR` / `NCHAR` / `NVCHAR` / `LVARCHAR` | `str` |
| `BOOLEAN` | `bool` |
| `DATE` | `datetime.date` |
| `DATETIME YEAR TO ...` | `datetime.datetime` / `datetime.time` / `datetime.date` |
| `INTERVAL DAY TO FRACTION` | `datetime.timedelta` |
| `INTERVAL YEAR TO MONTH` | `informix_db.IntervalYM` |
| `BYTE` / `TEXT` (legacy in-row blobs) | `bytes` / `str` |
| `BLOB` / `CLOB` (smart-LOBs) | `informix_db.BlobLocator` / `informix_db.ClobLocator` (read via `cursor.read_blob_column`, write via `cursor.write_blob_column`) |
| `ROW(...)` | `informix_db.RowValue` |
| `SET(...)` / `MULTISET(...)` / `LIST(...)` | `informix_db.CollectionValue` |
| `NULL` | `None` |
## Smart-LOB (BLOB / CLOB) read & write
```python
# Read: returns the actual bytes
data = cur.read_blob_column(
"SELECT data FROM photos WHERE id = ?", (42,)
)
# Write: BLOB_PLACEHOLDER token marks where the BLOB goes
cur.write_blob_column(
"INSERT INTO photos VALUES (?, BLOB_PLACEHOLDER)",
blob_data=jpeg_bytes,
params=(42,),
)
```
Both work end-to-end in pure Python via the `lotofile` / `filetoblob` server functions intercepted at the `SQ_FILE` (98) wire-protocol level — no native machinery anywhere in the thread of execution. See [`docs/DECISION_LOG.md`](docs/DECISION_LOG.md) §1011 for the architecture pivot that made this possible.
## Direct stored-procedure invocation (fast-path)
```python
# Cleanly close a smart-LOB descriptor opened via SQL
result = conn.fast_path_call(
"function informix.ifx_lo_close(integer)", lofd
)
# result == [0] on success
```
The fast-path RPC (`SQ_FPROUTINE` / `SQ_EXFPROUTINE`) bypasses PREPARE → EXECUTE → FETCH for direct UDF/SPL calls. Routine handles are cached per-connection, so repeated calls to the same function take a single round-trip.
## Server compatibility
Tested against IBM Informix Dynamic Server **15.0.1.0.3DE** (the official `icr.io/informix/informix-developer-database` Docker image). The wire protocol is stable across modern Informix versions; should work against 12.10+ unmodified.
For features that need server-side configuration (smart-LOBs, logged transactions), see [`docs/DECISION_LOG.md`](docs/DECISION_LOG.md):
- Phase 7 — logged-DB transactions
- Phase 8 — BYTE/TEXT (needs blobspace)
- Phase 10/11 — BLOB/CLOB (needs sbspace + `SBSPACENAME` config + level-0 archive)
## Performance
Single-connection benchmarks against the dev container on loopback:
| Operation | Mean | Throughput |
|---|---:|---:|
| `decode(int)` per cell | 139 ns | 7.2M ops/sec |
| `parse_tuple_payload` per row (5 cols) | 1.4 µs | 715K rows/sec |
| `SELECT 1` round-trip | ~140 µs | ~7K queries/sec |
| 1000-row SELECT | ~1.0 ms | ~990K rows/sec sustained |
| `executemany(1000)` in transaction | 32 ms | **~31,000 rows/sec** |
| Pool acquire + query + release | 295 µs | ~3.4K queries/sec |
| Cold connect (login handshake) | 11 ms | ~90 connections/sec |
**Performance gotcha**: `executemany(...)` under `autocommit=True` is **53× slower** than the same call inside a single transaction (server flushes the transaction log per row). For bulk loads, `autocommit=False` (default) + `conn.commit()` at the end. See [`docs/USAGE.md`](docs/USAGE.md) for the full performance tips section.
### Compared to IfxPy (the C-bound PyPI driver)
Head-to-head benchmarks against [IfxPy](https://pypi.org/project/IfxPy/) on identical workloads, same Informix server, matched conditions:
| Benchmark | IfxPy 3.0.5 (C-bound) | `informix-db` (pure Python) | Result |
|---|---:|---:|---:|
| Single-row SELECT round-trip | 128 µs | **116 µs** | **9% faster** |
| 1000-row SELECT (full fetch) | 969 µs | **855 µs** | **12% faster** |
| `executemany(1000)` in transaction | 21.5 ms | 30.8 ms | 30% slower |
| Cold connect (login handshake) | 11.0 ms | 10.9 ms | comparable |
`informix-db` wins where the wire round-trip dominates (IfxPy's ODBC abstraction layer adds overhead), and loses where per-row marshaling dominates (IfxPy's C-level `execute(stmt, tuple)` beats our Python BIND-PDU build). Within the same order of magnitude on every workload.
**Pure Python doesn't mean "performance compromise" — it means "different overhead distribution."** Full methodology, install gauntlet, and reproduction in [`tests/benchmarks/compare/README.md`](tests/benchmarks/compare/README.md).
A note on IfxPy's install gauntlet: getting it to run on a modern system requires Python ≤ 3.11, setuptools <58, permissive CFLAGS, manual download of a 92 MB ODBC tarball, four `LD_LIBRARY_PATH` directories, and `libcrypt.so.1` (deprecated 2018, missing on Arch / Fedora 35+ / RHEL 9). `informix-db`'s install: `pip install informix-db`.
## Standards & guarantees
* **PEP 249** (DB-API 2.0): `connect()`, `Connection`, `Cursor`, `description`, `rowcount`, exception hierarchy
* **`paramstyle = "numeric"`** (Informix's native ESQL/C convention; `?` and `:1` both work)
* **Threadsafety = 1**: threads may share the module but not connections; the pool gives per-thread connection access. Phase 27 added a per-connection wire lock that makes accidental sharing safe (interleaved PDUs serialize correctly), but PEP 249 advice still holds give each thread its own connection.
* **CalVer versioning**: `YYYY.MM.DD` releases. PEP 440 post-releases (`.1`, `.2`) for same-day fixes.
## Development
The full test + lint workflow is in the [Makefile](Makefile). Quick summary:
```bash
make test # 77 unit tests (no Docker)
make ifx-up && make test-integration # 231 integration tests
make bench # benchmark suite
make lint # ruff
```
For the smart-LOB tests specifically, the dev container needs additional one-time setup (blobspace + sbspace + level-0 archive). See [`docs/DECISION_LOG.md`](docs/DECISION_LOG.md) §10 for the `onspaces` / `onmode` / `ontape` commands.
## Documentation
- [**`docs/USAGE.md`**](docs/USAGE.md) practical recipes: connections, parameter binding, type mapping, transactions, performance tips, scrollable cursors, BLOBs, async, TLS, locale/Unicode, error handling, known limitations
- [`tests/benchmarks/README.md`](tests/benchmarks/README.md) performance baselines, headline numbers, how to run regressions
- `CHANGELOG.md` phase-by-phase release notes
## Project history & design rationale
This driver was built incrementally across 30 phases, each with a focused scope and decision log. The reasoning trail lives in:
- [`docs/PROTOCOL_NOTES.md`](docs/PROTOCOL_NOTES.md) byte-level SQLI wire-format reference
- [`docs/JDBC_NOTES.md`](docs/JDBC_NOTES.md) index into the decompiled IBM JDBC driver, used as a clean-room reference
- [`docs/DECISION_LOG.md`](docs/DECISION_LOG.md) phase-by-phase architectural decisions, with the *why* preserved
- [`docs/CAPTURES/`](docs/CAPTURES/) annotated socat hex-dump captures
Notable architectural pivots documented in the decision log:
- **Phase 10/11** (smart-LOB read/write): used `lotofile`/`filetoblob` SQL functions + `SQ_FILE` protocol intercept instead of the heavier `SQ_FPROUTINE` + `SQ_LODATA` stack ~3x smaller than originally projected
- **Phase 7** (logged-DB transactions): discovered Informix requires explicit `SQ_BEGIN` before each transaction in non-ANSI mode, plus `SQ_RBWORK` needs a savepoint short payload
- **Phase 16** (async): shipped thread-pool wrapping (~250 lines) instead of full I/O abstraction refactor (~2000 lines); functionally equivalent for typical FastAPI workloads
## License
MIT.