informix-db/README.md
Ryan Malloy 1582a5295d Prepare 2026.05.08 for first PyPI publish
- Bump version 2026.05.05.12 → 2026.05.08 (CalVer, publish date)
- Expand sdist excludes to **/glob patterns: docs/**, docs-site/**,
  tests/reference/**, tests/benchmarks/.results/** — the trailing-slash
  form was silently passing through subtree contents
- Sanitize tests/benchmarks/baseline.json hostname → PLACEHOLDER
- Rewrite README relative docs/, tests/, Makefile links to absolute
  Gitea URLs (git.supported.systems/warehack.ing/informix-db)
- pyproject urls: Homepage + Documentation → informix-db.warehack.ing,
  Source/Issues/Changelog → Gitea (warehack.ing org is now public)
2026-05-08 04:13:09 -06:00

254 lines
13 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# informix-db
Pure-Python driver for IBM Informix IDS, speaking the SQLI wire protocol over raw sockets. **No IBM Client SDK. No JVM. No native libraries.** PEP 249 compliant; sync + async APIs; built-in connection pool; TLS support.
To our knowledge this is the **first pure-socket Informix driver in any language** — every other Informix driver (`IfxPy`, the legacy `informixdb`, ODBC bridges, JPype/JDBC, Perl `DBD::Informix`) wraps either IBM's CSDK or the JDBC JAR.
```bash
pip install informix-db
```
Requires Python ≥ 3.10.
## Status
**Production ready.** Every finding from a system-wide failure-mode audit (data correctness, wire safety, resource leaks, concurrency, async cancellation) has been addressed:
| Severity | Finding | Status |
|---|---|---|
| Critical | Pool returns connections with open transactions | Fixed (Phase 26) |
| Critical | Unsynchronized wire path → PDU interleaving | Fixed (Phase 27) — per-connection wire lock |
| High | Async cancellation leaks running workers onto recycled connections | Fixed (Phase 27) |
| High | `_raise_sq_err` bare-except masks wire desync | Fixed (Phase 28) |
| High | Cursor finalizers — server-side resources leak on mid-fetch raise | Fixed (Phase 28+29) |
| Medium | 5 hardening items | Fixed (Phase 28+30) |
**0 critical, 0 high, 0 medium audit findings remain.** Every architectural change went through a Margaret Hamilton-style review focused on silent-failure modes, recovery paths, and documented invariants. Each documented invariant is paired with either a runtime guard or a CI tripwire test.
**Test coverage:** 300+ tests across unit / integration / benchmark suites. Integration tests run against the official IBM Informix Developer Edition Docker image (15.0.1.0.3DE).
## Quick start
```python
import informix_db
with informix_db.connect(
host="db.example.com", port=9088,
user="informix", password="...",
database="mydb", server="informix",
) as conn:
cur = conn.cursor()
cur.execute("SELECT id, name FROM users WHERE id = ?", (42,))
user_id, name = cur.fetchone()
```
## Async (FastAPI / aiohttp / asyncio)
```python
import asyncio
from informix_db import aio
async def main():
pool = await aio.create_pool(
host="db.example.com", user="informix", password="...",
database="mydb",
min_size=1, max_size=10,
)
async with pool.connection() as conn:
cur = await conn.cursor()
await cur.execute("SELECT id, name FROM users WHERE id = ?", (42,))
row = await cur.fetchone()
await pool.close()
asyncio.run(main())
```
## Connection pool (sync)
```python
import informix_db
pool = informix_db.create_pool(
host="db.example.com", user="informix", password="...",
database="mydb",
min_size=1, max_size=10, acquire_timeout=5.0,
)
with pool.connection() as conn:
cur = conn.cursor()
cur.execute("...")
pool.close()
```
## TLS
```python
import ssl
# Production: bring your own context
ctx = ssl.create_default_context(cafile="/path/to/ca.pem")
informix_db.connect(host="...", port=9089, ..., tls=ctx)
# Dev / self-signed: tls=True disables verification
informix_db.connect(host="127.0.0.1", port=9089, ..., tls=True)
```
Informix uses dedicated TLS-enabled listener ports (configured server-side in `sqlhosts`) rather than STARTTLS upgrade — point `port` at the TLS listener (often `9089`) when `tls` is enabled.
## Type support
| SQL type | Python type |
|---|---|
| `SMALLINT` / `INT` / `BIGINT` / `SERIAL` | `int` |
| `FLOAT` / `SMALLFLOAT` | `float` |
| `DECIMAL(p,s)` / `MONEY` | `decimal.Decimal` |
| `CHAR` / `VARCHAR` / `NCHAR` / `NVCHAR` / `LVARCHAR` | `str` |
| `BOOLEAN` | `bool` |
| `DATE` | `datetime.date` |
| `DATETIME YEAR TO ...` | `datetime.datetime` / `datetime.time` / `datetime.date` |
| `INTERVAL DAY TO FRACTION` | `datetime.timedelta` |
| `INTERVAL YEAR TO MONTH` | `informix_db.IntervalYM` |
| `BYTE` / `TEXT` (legacy in-row blobs) | `bytes` / `str` |
| `BLOB` / `CLOB` (smart-LOBs) | `informix_db.BlobLocator` / `informix_db.ClobLocator` (read via `cursor.read_blob_column`, write via `cursor.write_blob_column`) |
| `ROW(...)` | `informix_db.RowValue` |
| `SET(...)` / `MULTISET(...)` / `LIST(...)` | `informix_db.CollectionValue` |
| `NULL` | `None` |
## Smart-LOB (BLOB / CLOB) read & write
```python
# Read: returns the actual bytes
data = cur.read_blob_column(
"SELECT data FROM photos WHERE id = ?", (42,)
)
# Write: BLOB_PLACEHOLDER token marks where the BLOB goes
cur.write_blob_column(
"INSERT INTO photos VALUES (?, BLOB_PLACEHOLDER)",
blob_data=jpeg_bytes,
params=(42,),
)
```
Both work end-to-end in pure Python via the `lotofile` / `filetoblob` server functions intercepted at the `SQ_FILE` (98) wire-protocol level — no native machinery anywhere in the thread of execution. See [`docs/DECISION_LOG.md`](https://git.supported.systems/warehack.ing/informix-db/src/branch/main/docs/DECISION_LOG.md) §1011 for the architecture pivot that made this possible.
## Direct stored-procedure invocation (fast-path)
```python
# Cleanly close a smart-LOB descriptor opened via SQL
result = conn.fast_path_call(
"function informix.ifx_lo_close(integer)", lofd
)
# result == [0] on success
```
The fast-path RPC (`SQ_FPROUTINE` / `SQ_EXFPROUTINE`) bypasses PREPARE → EXECUTE → FETCH for direct UDF/SPL calls. Routine handles are cached per-connection, so repeated calls to the same function take a single round-trip.
## Server compatibility
Tested against IBM Informix Dynamic Server **15.0.1.0.3DE** (the official `icr.io/informix/informix-developer-database` Docker image). The wire protocol is stable across modern Informix versions; should work against 12.10+ unmodified.
For features that need server-side configuration (smart-LOBs, logged transactions), see [`docs/DECISION_LOG.md`](https://git.supported.systems/warehack.ing/informix-db/src/branch/main/docs/DECISION_LOG.md):
- Phase 7 — logged-DB transactions
- Phase 8 — BYTE/TEXT (needs blobspace)
- Phase 10/11 — BLOB/CLOB (needs sbspace + `SBSPACENAME` config + level-0 archive)
## Performance
Single-connection benchmarks against the dev container on loopback:
| Operation | Mean | Throughput |
|---|---:|---:|
| `decode(int)` per cell | 139 ns | 7.2M ops/sec |
| `parse_tuple_payload` per row (5 cols) | 1.4 µs | 715K rows/sec |
| `SELECT 1` round-trip | ~140 µs | ~7K queries/sec |
| 1000-row SELECT | ~1.0 ms | ~990K rows/sec sustained |
| `executemany(1000)` in transaction | 32 ms | **~31,000 rows/sec** |
| Pool acquire + query + release | 295 µs | ~3.4K queries/sec |
| Cold connect (login handshake) | 11 ms | ~90 connections/sec |
**Performance gotcha**: `executemany(...)` under `autocommit=True` is **53× slower** than the same call inside a single transaction (server flushes the transaction log per row). For bulk loads, `autocommit=False` (default) + `conn.commit()` at the end. See [`docs/USAGE.md`](https://git.supported.systems/warehack.ing/informix-db/src/branch/main/docs/USAGE.md) for the full performance tips section.
### Compared to IfxPy (the C-bound PyPI driver)
Head-to-head benchmarks against [IfxPy](https://pypi.org/project/IfxPy/) on identical workloads, same Informix server, matched conditions. Using **median + IQR over 10+ rounds** to resist outlier-round noise:
| Benchmark | IfxPy 3.0.5 (C-bound) | `informix-db` (pure Python) | Result |
|---|---:|---:|---:|
| Single-row SELECT round-trip | 118 µs | 114 µs | comparable |
| ~10-row server-side query | 130 µs | 159 µs | IfxPy 22% faster |
| Cold connect (login handshake) | 11.0 ms | 10.5 ms | comparable |
| **`executemany(1k)` in transaction** | 23.5 ms | 23.2 ms | tied |
| **`executemany(10k)` in transaction** | 259 ms | **161 ms** | **`informix-db` 1.6× faster** |
| **`executemany(100k)` in transaction** | 2376 ms | **1487 ms** | **`informix-db` 1.6× faster** |
| `SELECT` 1k rows | 1.2 ms | 2.7 ms | IfxPy 2.3× faster |
| `SELECT` 10k rows | 11.3 ms | 25.8 ms | IfxPy 2.3× faster |
| `SELECT` 100k rows | 112 ms | 271 ms | IfxPy 2.4× faster |
**The honest summary:**
- **Bulk-insert workloads: `informix-db` wins 1.6× at scale.** The pipelined `executemany` (Phase 33) sends all N BIND+EXECUTE PDUs before draining responses, eliminating per-row RTT. IfxPy still pays one round-trip per `IfxPy.execute(stmt, tuple)` call.
- **Large-fetch workloads: IfxPy wins 2.3× at scale.** Their C-level `fetch_tuple` decoder is genuinely faster than our Python `parse_tuple_payload` (~1.1 µs/row vs ~2.7 µs/row). At 100k rows, that 1.6 µs/row gap accumulates into a 160 ms wall-clock difference.
- **Small queries: comparable.** Both spend ~120 µs waiting for the server; the per-call codec cost is small relative to the round-trip.
**When to prefer `informix-db`:**
- ETL pipelines, log shipping, bulk writes (1.6× faster at scale)
- Containerized / minimal-dependency environments (50 KB wheel vs IfxPy's 92 MB OneDB tarball + libcrypt.so.1 dependency hell)
- Modern Python (works on 3.103.14; IfxPy is broken on Python 3.12+)
- Async / FastAPI workloads (we have native async; IfxPy doesn't)
**When IfxPy may be faster:**
- Analytical reporting queries pulling 10k+ rows in a single SELECT
- Workloads where the per-row decode cost dominates (wide rows, tight read loops)
These results are reproducible from `tests/benchmarks/compare/` — the Dockerfile, bench script, and README walk through every step.
Full methodology, IQR caveats, install gauntlet, and reproduction in [`tests/benchmarks/compare/README.md`](https://git.supported.systems/warehack.ing/informix-db/src/branch/main/tests/benchmarks/compare/README.md).
A note on IfxPy's install gauntlet: getting it to run on a modern system requires Python ≤ 3.11, setuptools <58, permissive CFLAGS, manual download of a 92 MB ODBC tarball, four `LD_LIBRARY_PATH` directories, and `libcrypt.so.1` (deprecated 2018, missing on Arch / Fedora 35+ / RHEL 9). `informix-db`'s install: `pip install informix-db`.
## Standards & guarantees
* **PEP 249** (DB-API 2.0): `connect()`, `Connection`, `Cursor`, `description`, `rowcount`, exception hierarchy
* **`paramstyle = "numeric"`** (Informix's native ESQL/C convention; `?` and `:1` both work)
* **Threadsafety = 1**: threads may share the module but not connections; the pool gives per-thread connection access. Phase 27 added a per-connection wire lock that makes accidental sharing safe (interleaved PDUs serialize correctly), but PEP 249 advice still holds give each thread its own connection.
* **CalVer versioning**: `YYYY.MM.DD` releases. PEP 440 post-releases (`.1`, `.2`) for same-day fixes.
## Development
The full test + lint workflow is in the [Makefile](https://git.supported.systems/warehack.ing/informix-db/src/branch/main/Makefile). Quick summary:
```bash
make test # 77 unit tests (no Docker)
make ifx-up && make test-integration # 231 integration tests
make bench # benchmark suite
make lint # ruff
```
For the smart-LOB tests specifically, the dev container needs additional one-time setup (blobspace + sbspace + level-0 archive). See [`docs/DECISION_LOG.md`](https://git.supported.systems/warehack.ing/informix-db/src/branch/main/docs/DECISION_LOG.md) §10 for the `onspaces` / `onmode` / `ontape` commands.
## Documentation
- [**`docs/USAGE.md`**](https://git.supported.systems/warehack.ing/informix-db/src/branch/main/docs/USAGE.md) practical recipes: connections, parameter binding, type mapping, transactions, performance tips, scrollable cursors, BLOBs, async, TLS, locale/Unicode, error handling, known limitations
- [`tests/benchmarks/README.md`](https://git.supported.systems/warehack.ing/informix-db/src/branch/main/tests/benchmarks/README.md) performance baselines, headline numbers, how to run regressions
- `CHANGELOG.md` phase-by-phase release notes
## Project history & design rationale
This driver was built incrementally across 30 phases, each with a focused scope and decision log. The reasoning trail lives in:
- [`docs/PROTOCOL_NOTES.md`](https://git.supported.systems/warehack.ing/informix-db/src/branch/main/docs/PROTOCOL_NOTES.md) byte-level SQLI wire-format reference
- [`docs/JDBC_NOTES.md`](https://git.supported.systems/warehack.ing/informix-db/src/branch/main/docs/JDBC_NOTES.md) index into the decompiled IBM JDBC driver, used as a clean-room reference
- [`docs/DECISION_LOG.md`](https://git.supported.systems/warehack.ing/informix-db/src/branch/main/docs/DECISION_LOG.md) phase-by-phase architectural decisions, with the *why* preserved
- [`docs/CAPTURES/`](https://git.supported.systems/warehack.ing/informix-db/src/branch/main/docs/CAPTURES/) annotated socat hex-dump captures
Notable architectural pivots documented in the decision log:
- **Phase 10/11** (smart-LOB read/write): used `lotofile`/`filetoblob` SQL functions + `SQ_FILE` protocol intercept instead of the heavier `SQ_FPROUTINE` + `SQ_LODATA` stack ~3x smaller than originally projected
- **Phase 7** (logged-DB transactions): discovered Informix requires explicit `SQ_BEGIN` before each transaction in non-ANSI mode, plus `SQ_RBWORK` needs a savepoint short payload
- **Phase 16** (async): shipped thread-pool wrapping (~250 lines) instead of full I/O abstraction refactor (~2000 lines); functionally equivalent for typical FastAPI workloads
## License
MIT.