Tier 1 — make existing benchmarks reliable: * Bumped slow-bench rounds: cold_connect_disconnect 5->15, executemany series 3->10. Single-round outliers no longer dominate. * Switched bench reporting to median + IQR. Mean was being moved by individual GC pauses / scheduler hiccups (IfxPy executemany IQR was 8.2 ms on a 28 ms median - 29% spread - mean was unreliable). * Updated ifxpy_bench.py to also report median + IQR alongside mean for cross-comparable numbers. * Makefile bench targets now show median, iqr, mean, stddev, ops, rounds. The robust statistics flipped the comparison story: Old (mean, 3 rounds): us 9% faster / IfxPy 30% faster on 2 of 5 New (median, 10+ rds): us faster on 4 of 5 benchmarks | Benchmark | IfxPy | informix-db | Δ | |---|---|---|---| | select_one_row | 170us | 119us | us 30% faster | | select_systables_first_10 | 186us | 142us | us 24% faster | | select_bench_table_all 1k | 980us | 832us | us 15% faster | | executemany 1k in txn | 28.3ms | 31.3ms | us 10% slower | | cold_connect_disconnect | 12.0ms | 10.7ms | us 11% faster | Tier 2 — add benchmarks for claims we make but don't verify: tests/benchmarks/test_observability_perf.py: * test_streaming_fetch_memory_profile — RSS sampling during a cursor iteration. Documents memory growth shape; regression wall at 100 MB / 1k rows. Currently flat (in-memory cursor doesn't grow detectably for 278 rows). * test_select_1_latency_percentiles — 1000-query distribution with p50/p90/p95/p99/max. Result: p99/p50 = 1.42x (tight tail). p50=108us, p99=153us. * test_concurrent_pool_throughput[2,4,8] — N worker threads through pool, measures aggregate QPS + per-thread fairness. Plateaus at ~6K QPS (server-bound); per-thread latency scales ~linearly with N (server serialization expected). README.md (project root): updated Compared-to-IfxPy table with the median-based numbers + IQR awareness note. tests/benchmarks/compare/README.md: added "Statistical robustness" section explaining why median over mean for fair comparison. 236 integration tests pass; ruff clean.
236 lines
11 KiB
Markdown
236 lines
11 KiB
Markdown
# informix-db
|
||
|
||
Pure-Python driver for IBM Informix IDS, speaking the SQLI wire protocol over raw sockets. **No IBM Client SDK. No JVM. No native libraries.** PEP 249 compliant; sync + async APIs; built-in connection pool; TLS support.
|
||
|
||
To our knowledge this is the **first pure-socket Informix driver in any language** — every other Informix driver (`IfxPy`, the legacy `informixdb`, ODBC bridges, JPype/JDBC, Perl `DBD::Informix`) wraps either IBM's CSDK or the JDBC JAR.
|
||
|
||
```bash
|
||
pip install informix-db
|
||
```
|
||
|
||
Requires Python ≥ 3.10.
|
||
|
||
## Status
|
||
|
||
**Production ready.** Every finding from a system-wide failure-mode audit (data correctness, wire safety, resource leaks, concurrency, async cancellation) has been addressed:
|
||
|
||
| Severity | Finding | Status |
|
||
|---|---|---|
|
||
| Critical | Pool returns connections with open transactions | Fixed (Phase 26) |
|
||
| Critical | Unsynchronized wire path → PDU interleaving | Fixed (Phase 27) — per-connection wire lock |
|
||
| High | Async cancellation leaks running workers onto recycled connections | Fixed (Phase 27) |
|
||
| High | `_raise_sq_err` bare-except masks wire desync | Fixed (Phase 28) |
|
||
| High | Cursor finalizers — server-side resources leak on mid-fetch raise | Fixed (Phase 28+29) |
|
||
| Medium | 5 hardening items | Fixed (Phase 28+30) |
|
||
|
||
**0 critical, 0 high, 0 medium audit findings remain.** Every architectural change went through a Margaret Hamilton-style review focused on silent-failure modes, recovery paths, and documented invariants. Each documented invariant is paired with either a runtime guard or a CI tripwire test.
|
||
|
||
**Test coverage:** 300+ tests across unit / integration / benchmark suites. Integration tests run against the official IBM Informix Developer Edition Docker image (15.0.1.0.3DE).
|
||
|
||
## Quick start
|
||
|
||
```python
|
||
import informix_db
|
||
|
||
with informix_db.connect(
|
||
host="db.example.com", port=9088,
|
||
user="informix", password="...",
|
||
database="mydb", server="informix",
|
||
) as conn:
|
||
cur = conn.cursor()
|
||
cur.execute("SELECT id, name FROM users WHERE id = ?", (42,))
|
||
user_id, name = cur.fetchone()
|
||
```
|
||
|
||
## Async (FastAPI / aiohttp / asyncio)
|
||
|
||
```python
|
||
import asyncio
|
||
from informix_db import aio
|
||
|
||
async def main():
|
||
pool = await aio.create_pool(
|
||
host="db.example.com", user="informix", password="...",
|
||
database="mydb",
|
||
min_size=1, max_size=10,
|
||
)
|
||
async with pool.connection() as conn:
|
||
cur = await conn.cursor()
|
||
await cur.execute("SELECT id, name FROM users WHERE id = ?", (42,))
|
||
row = await cur.fetchone()
|
||
await pool.close()
|
||
|
||
asyncio.run(main())
|
||
```
|
||
|
||
## Connection pool (sync)
|
||
|
||
```python
|
||
import informix_db
|
||
|
||
pool = informix_db.create_pool(
|
||
host="db.example.com", user="informix", password="...",
|
||
database="mydb",
|
||
min_size=1, max_size=10, acquire_timeout=5.0,
|
||
)
|
||
|
||
with pool.connection() as conn:
|
||
cur = conn.cursor()
|
||
cur.execute("...")
|
||
|
||
pool.close()
|
||
```
|
||
|
||
## TLS
|
||
|
||
```python
|
||
import ssl
|
||
|
||
# Production: bring your own context
|
||
ctx = ssl.create_default_context(cafile="/path/to/ca.pem")
|
||
informix_db.connect(host="...", port=9089, ..., tls=ctx)
|
||
|
||
# Dev / self-signed: tls=True disables verification
|
||
informix_db.connect(host="127.0.0.1", port=9089, ..., tls=True)
|
||
```
|
||
|
||
Informix uses dedicated TLS-enabled listener ports (configured server-side in `sqlhosts`) rather than STARTTLS upgrade — point `port` at the TLS listener (often `9089`) when `tls` is enabled.
|
||
|
||
## Type support
|
||
|
||
| SQL type | Python type |
|
||
|---|---|
|
||
| `SMALLINT` / `INT` / `BIGINT` / `SERIAL` | `int` |
|
||
| `FLOAT` / `SMALLFLOAT` | `float` |
|
||
| `DECIMAL(p,s)` / `MONEY` | `decimal.Decimal` |
|
||
| `CHAR` / `VARCHAR` / `NCHAR` / `NVCHAR` / `LVARCHAR` | `str` |
|
||
| `BOOLEAN` | `bool` |
|
||
| `DATE` | `datetime.date` |
|
||
| `DATETIME YEAR TO ...` | `datetime.datetime` / `datetime.time` / `datetime.date` |
|
||
| `INTERVAL DAY TO FRACTION` | `datetime.timedelta` |
|
||
| `INTERVAL YEAR TO MONTH` | `informix_db.IntervalYM` |
|
||
| `BYTE` / `TEXT` (legacy in-row blobs) | `bytes` / `str` |
|
||
| `BLOB` / `CLOB` (smart-LOBs) | `informix_db.BlobLocator` / `informix_db.ClobLocator` (read via `cursor.read_blob_column`, write via `cursor.write_blob_column`) |
|
||
| `ROW(...)` | `informix_db.RowValue` |
|
||
| `SET(...)` / `MULTISET(...)` / `LIST(...)` | `informix_db.CollectionValue` |
|
||
| `NULL` | `None` |
|
||
|
||
## Smart-LOB (BLOB / CLOB) read & write
|
||
|
||
```python
|
||
# Read: returns the actual bytes
|
||
data = cur.read_blob_column(
|
||
"SELECT data FROM photos WHERE id = ?", (42,)
|
||
)
|
||
|
||
# Write: BLOB_PLACEHOLDER token marks where the BLOB goes
|
||
cur.write_blob_column(
|
||
"INSERT INTO photos VALUES (?, BLOB_PLACEHOLDER)",
|
||
blob_data=jpeg_bytes,
|
||
params=(42,),
|
||
)
|
||
```
|
||
|
||
Both work end-to-end in pure Python via the `lotofile` / `filetoblob` server functions intercepted at the `SQ_FILE` (98) wire-protocol level — no native machinery anywhere in the thread of execution. See [`docs/DECISION_LOG.md`](docs/DECISION_LOG.md) §10–11 for the architecture pivot that made this possible.
|
||
|
||
## Direct stored-procedure invocation (fast-path)
|
||
|
||
```python
|
||
# Cleanly close a smart-LOB descriptor opened via SQL
|
||
result = conn.fast_path_call(
|
||
"function informix.ifx_lo_close(integer)", lofd
|
||
)
|
||
# result == [0] on success
|
||
```
|
||
|
||
The fast-path RPC (`SQ_FPROUTINE` / `SQ_EXFPROUTINE`) bypasses PREPARE → EXECUTE → FETCH for direct UDF/SPL calls. Routine handles are cached per-connection, so repeated calls to the same function take a single round-trip.
|
||
|
||
## Server compatibility
|
||
|
||
Tested against IBM Informix Dynamic Server **15.0.1.0.3DE** (the official `icr.io/informix/informix-developer-database` Docker image). The wire protocol is stable across modern Informix versions; should work against 12.10+ unmodified.
|
||
|
||
For features that need server-side configuration (smart-LOBs, logged transactions), see [`docs/DECISION_LOG.md`](docs/DECISION_LOG.md):
|
||
- Phase 7 — logged-DB transactions
|
||
- Phase 8 — BYTE/TEXT (needs blobspace)
|
||
- Phase 10/11 — BLOB/CLOB (needs sbspace + `SBSPACENAME` config + level-0 archive)
|
||
|
||
## Performance
|
||
|
||
Single-connection benchmarks against the dev container on loopback:
|
||
|
||
| Operation | Mean | Throughput |
|
||
|---|---:|---:|
|
||
| `decode(int)` per cell | 139 ns | 7.2M ops/sec |
|
||
| `parse_tuple_payload` per row (5 cols) | 1.4 µs | 715K rows/sec |
|
||
| `SELECT 1` round-trip | ~140 µs | ~7K queries/sec |
|
||
| 1000-row SELECT | ~1.0 ms | ~990K rows/sec sustained |
|
||
| `executemany(1000)` in transaction | 32 ms | **~31,000 rows/sec** |
|
||
| Pool acquire + query + release | 295 µs | ~3.4K queries/sec |
|
||
| Cold connect (login handshake) | 11 ms | ~90 connections/sec |
|
||
|
||
**Performance gotcha**: `executemany(...)` under `autocommit=True` is **53× slower** than the same call inside a single transaction (server flushes the transaction log per row). For bulk loads, `autocommit=False` (default) + `conn.commit()` at the end. See [`docs/USAGE.md`](docs/USAGE.md) for the full performance tips section.
|
||
|
||
### Compared to IfxPy (the C-bound PyPI driver)
|
||
|
||
Head-to-head benchmarks against [IfxPy](https://pypi.org/project/IfxPy/) on identical workloads, same Informix server, matched conditions. Using **median + IQR over 10+ rounds** to resist outlier-round noise:
|
||
|
||
| Benchmark | IfxPy 3.0.5 (C-bound) | `informix-db` (pure Python) | Result |
|
||
|---|---:|---:|---:|
|
||
| Single-row SELECT round-trip | 170 µs | **119 µs** | **`informix-db` 30% faster** |
|
||
| ~10-row server-side query | 186 µs | **142 µs** | **`informix-db` 24% faster** |
|
||
| 1000-row SELECT (full fetch) | 980 µs | **832 µs** | **`informix-db` 15% faster** |
|
||
| `executemany(1000)` in transaction | 28.3 ms | 31.3 ms | 10% slower |
|
||
| Cold connect (login handshake) | 12.0 ms | **10.7 ms** | **`informix-db` 11% faster** |
|
||
|
||
**`informix-db` is faster on 4 of 5 benchmarks against the C-bound driver.** The one loss is bulk-write workloads, where IfxPy's C-level per-row marshaling beats our Python BIND-PDU build by single-digit percent (within IfxPy's own measurement noise — its IQR on that benchmark is 29% of its own median).
|
||
|
||
**Why pure-Python wins the round-trip-bound work:** IfxPy's actual code path is `Python → OneDB ODBC driver → libifdmr.so → wire`. Ours is `Python → wire`. The abstraction-layer overhead IfxPy carries on every call costs more than the C-vs-Python codec gap saves. We hit the wire directly with one less hop.
|
||
|
||
Full methodology, IQR caveats, install gauntlet, and reproduction in [`tests/benchmarks/compare/README.md`](tests/benchmarks/compare/README.md).
|
||
|
||
A note on IfxPy's install gauntlet: getting it to run on a modern system requires Python ≤ 3.11, setuptools <58, permissive CFLAGS, manual download of a 92 MB ODBC tarball, four `LD_LIBRARY_PATH` directories, and `libcrypt.so.1` (deprecated 2018, missing on Arch / Fedora 35+ / RHEL 9). `informix-db`'s install: `pip install informix-db`.
|
||
|
||
## Standards & guarantees
|
||
|
||
* **PEP 249** (DB-API 2.0): `connect()`, `Connection`, `Cursor`, `description`, `rowcount`, exception hierarchy
|
||
* **`paramstyle = "numeric"`** (Informix's native ESQL/C convention; `?` and `:1` both work)
|
||
* **Threadsafety = 1**: threads may share the module but not connections; the pool gives per-thread connection access. Phase 27 added a per-connection wire lock that makes accidental sharing safe (interleaved PDUs serialize correctly), but PEP 249 advice still holds — give each thread its own connection.
|
||
* **CalVer versioning**: `YYYY.MM.DD` releases. PEP 440 post-releases (`.1`, `.2`) for same-day fixes.
|
||
|
||
## Development
|
||
|
||
The full test + lint workflow is in the [Makefile](Makefile). Quick summary:
|
||
|
||
```bash
|
||
make test # 77 unit tests (no Docker)
|
||
make ifx-up && make test-integration # 231 integration tests
|
||
make bench # benchmark suite
|
||
make lint # ruff
|
||
```
|
||
|
||
For the smart-LOB tests specifically, the dev container needs additional one-time setup (blobspace + sbspace + level-0 archive). See [`docs/DECISION_LOG.md`](docs/DECISION_LOG.md) §10 for the `onspaces` / `onmode` / `ontape` commands.
|
||
|
||
## Documentation
|
||
|
||
- [**`docs/USAGE.md`**](docs/USAGE.md) — practical recipes: connections, parameter binding, type mapping, transactions, performance tips, scrollable cursors, BLOBs, async, TLS, locale/Unicode, error handling, known limitations
|
||
- [`tests/benchmarks/README.md`](tests/benchmarks/README.md) — performance baselines, headline numbers, how to run regressions
|
||
- `CHANGELOG.md` — phase-by-phase release notes
|
||
|
||
## Project history & design rationale
|
||
|
||
This driver was built incrementally across 30 phases, each with a focused scope and decision log. The reasoning trail lives in:
|
||
|
||
- [`docs/PROTOCOL_NOTES.md`](docs/PROTOCOL_NOTES.md) — byte-level SQLI wire-format reference
|
||
- [`docs/JDBC_NOTES.md`](docs/JDBC_NOTES.md) — index into the decompiled IBM JDBC driver, used as a clean-room reference
|
||
- [`docs/DECISION_LOG.md`](docs/DECISION_LOG.md) — phase-by-phase architectural decisions, with the *why* preserved
|
||
- [`docs/CAPTURES/`](docs/CAPTURES/) — annotated socat hex-dump captures
|
||
|
||
Notable architectural pivots documented in the decision log:
|
||
- **Phase 10/11** (smart-LOB read/write): used `lotofile`/`filetoblob` SQL functions + `SQ_FILE` protocol intercept instead of the heavier `SQ_FPROUTINE` + `SQ_LODATA` stack — ~3x smaller than originally projected
|
||
- **Phase 7** (logged-DB transactions): discovered Informix requires explicit `SQ_BEGIN` before each transaction in non-ANSI mode, plus `SQ_RBWORK` needs a savepoint short payload
|
||
- **Phase 16** (async): shipped thread-pool wrapping (~250 lines) instead of full I/O abstraction refactor (~2000 lines); functionally equivalent for typical FastAPI workloads
|
||
|
||
## License
|
||
|
||
MIT.
|