informix-db/README.md
Ryan Malloy eb8d15d204 README + classifier polish for PyPI launch
PyPI users landing on the README need to know quickly:
- What this is (already strong)
- Whether it's safe to use in production (was missing)
- Performance expectations (was missing)
- Python version requirement (was only in pyproject.toml metadata)

Updates:
* Added "Status" section with the Hamilton audit findings table -
  every critical/high/medium addressed, 0 remaining. Names the
  Hamilton-style review process explicitly as the credibility signal.
* Added Python ≥ 3.10 requirement under the install command.
* Added "Performance" section with single-connection benchmarks and
  the 53x autocommit-cliff gotcha (most important perf pitfall).
* Updated "Standards & guarantees" to mention Phase 27's wire lock
  alongside the PEP 249 Threadsafety=1 declaration - accurate context
  for sophisticated readers.
* Tightened "Development" to PyPI-appropriate brevity (short Makefile
  target list instead of full uv invocations).
* Updated stale phase count (22+ → 30) and test counts (69 → 77 unit,
  163 → 231 integration). Added "300+ tests" rough number in the
  Status section to reduce future staleness churn.
* Fixed typo: "no thread of native machinery" → "no native machinery
  anywhere in the thread of execution".
* Bumped pyproject.toml classifier from "Development Status :: 4 -
  Beta" to "5 - Production/Stable" - earned by the audit work.

No code changes.
2026-05-05 11:06:49 -06:00

216 lines
9.5 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# informix-db
Pure-Python driver for IBM Informix IDS, speaking the SQLI wire protocol over raw sockets. **No IBM Client SDK. No JVM. No native libraries.** PEP 249 compliant; sync + async APIs; built-in connection pool; TLS support.
To our knowledge this is the **first pure-socket Informix driver in any language** — every other Informix driver (`IfxPy`, the legacy `informixdb`, ODBC bridges, JPype/JDBC, Perl `DBD::Informix`) wraps either IBM's CSDK or the JDBC JAR.
```bash
pip install informix-db
```
Requires Python ≥ 3.10.
## Status
**Production ready.** Every finding from a system-wide failure-mode audit (data correctness, wire safety, resource leaks, concurrency, async cancellation) has been addressed:
| Severity | Finding | Status |
|---|---|---|
| Critical | Pool returns connections with open transactions | Fixed (Phase 26) |
| Critical | Unsynchronized wire path → PDU interleaving | Fixed (Phase 27) — per-connection wire lock |
| High | Async cancellation leaks running workers onto recycled connections | Fixed (Phase 27) |
| High | `_raise_sq_err` bare-except masks wire desync | Fixed (Phase 28) |
| High | Cursor finalizers — server-side resources leak on mid-fetch raise | Fixed (Phase 28+29) |
| Medium | 5 hardening items | Fixed (Phase 28+30) |
**0 critical, 0 high, 0 medium audit findings remain.** Every architectural change went through a Margaret Hamilton-style review focused on silent-failure modes, recovery paths, and documented invariants. Each documented invariant is paired with either a runtime guard or a CI tripwire test.
**Test coverage:** 300+ tests across unit / integration / benchmark suites. Integration tests run against the official IBM Informix Developer Edition Docker image (15.0.1.0.3DE).
## Quick start
```python
import informix_db
with informix_db.connect(
host="db.example.com", port=9088,
user="informix", password="...",
database="mydb", server="informix",
) as conn:
cur = conn.cursor()
cur.execute("SELECT id, name FROM users WHERE id = ?", (42,))
user_id, name = cur.fetchone()
```
## Async (FastAPI / aiohttp / asyncio)
```python
import asyncio
from informix_db import aio
async def main():
pool = await aio.create_pool(
host="db.example.com", user="informix", password="...",
database="mydb",
min_size=1, max_size=10,
)
async with pool.connection() as conn:
cur = await conn.cursor()
await cur.execute("SELECT id, name FROM users WHERE id = ?", (42,))
row = await cur.fetchone()
await pool.close()
asyncio.run(main())
```
## Connection pool (sync)
```python
import informix_db
pool = informix_db.create_pool(
host="db.example.com", user="informix", password="...",
database="mydb",
min_size=1, max_size=10, acquire_timeout=5.0,
)
with pool.connection() as conn:
cur = conn.cursor()
cur.execute("...")
pool.close()
```
## TLS
```python
import ssl
# Production: bring your own context
ctx = ssl.create_default_context(cafile="/path/to/ca.pem")
informix_db.connect(host="...", port=9089, ..., tls=ctx)
# Dev / self-signed: tls=True disables verification
informix_db.connect(host="127.0.0.1", port=9089, ..., tls=True)
```
Informix uses dedicated TLS-enabled listener ports (configured server-side in `sqlhosts`) rather than STARTTLS upgrade — point `port` at the TLS listener (often `9089`) when `tls` is enabled.
## Type support
| SQL type | Python type |
|---|---|
| `SMALLINT` / `INT` / `BIGINT` / `SERIAL` | `int` |
| `FLOAT` / `SMALLFLOAT` | `float` |
| `DECIMAL(p,s)` / `MONEY` | `decimal.Decimal` |
| `CHAR` / `VARCHAR` / `NCHAR` / `NVCHAR` / `LVARCHAR` | `str` |
| `BOOLEAN` | `bool` |
| `DATE` | `datetime.date` |
| `DATETIME YEAR TO ...` | `datetime.datetime` / `datetime.time` / `datetime.date` |
| `INTERVAL DAY TO FRACTION` | `datetime.timedelta` |
| `INTERVAL YEAR TO MONTH` | `informix_db.IntervalYM` |
| `BYTE` / `TEXT` (legacy in-row blobs) | `bytes` / `str` |
| `BLOB` / `CLOB` (smart-LOBs) | `informix_db.BlobLocator` / `informix_db.ClobLocator` (read via `cursor.read_blob_column`, write via `cursor.write_blob_column`) |
| `ROW(...)` | `informix_db.RowValue` |
| `SET(...)` / `MULTISET(...)` / `LIST(...)` | `informix_db.CollectionValue` |
| `NULL` | `None` |
## Smart-LOB (BLOB / CLOB) read & write
```python
# Read: returns the actual bytes
data = cur.read_blob_column(
"SELECT data FROM photos WHERE id = ?", (42,)
)
# Write: BLOB_PLACEHOLDER token marks where the BLOB goes
cur.write_blob_column(
"INSERT INTO photos VALUES (?, BLOB_PLACEHOLDER)",
blob_data=jpeg_bytes,
params=(42,),
)
```
Both work end-to-end in pure Python via the `lotofile` / `filetoblob` server functions intercepted at the `SQ_FILE` (98) wire-protocol level — no native machinery anywhere in the thread of execution. See [`docs/DECISION_LOG.md`](docs/DECISION_LOG.md) §1011 for the architecture pivot that made this possible.
## Direct stored-procedure invocation (fast-path)
```python
# Cleanly close a smart-LOB descriptor opened via SQL
result = conn.fast_path_call(
"function informix.ifx_lo_close(integer)", lofd
)
# result == [0] on success
```
The fast-path RPC (`SQ_FPROUTINE` / `SQ_EXFPROUTINE`) bypasses PREPARE → EXECUTE → FETCH for direct UDF/SPL calls. Routine handles are cached per-connection, so repeated calls to the same function take a single round-trip.
## Server compatibility
Tested against IBM Informix Dynamic Server **15.0.1.0.3DE** (the official `icr.io/informix/informix-developer-database` Docker image). The wire protocol is stable across modern Informix versions; should work against 12.10+ unmodified.
For features that need server-side configuration (smart-LOBs, logged transactions), see [`docs/DECISION_LOG.md`](docs/DECISION_LOG.md):
- Phase 7 — logged-DB transactions
- Phase 8 — BYTE/TEXT (needs blobspace)
- Phase 10/11 — BLOB/CLOB (needs sbspace + `SBSPACENAME` config + level-0 archive)
## Performance
Single-connection benchmarks against the dev container on loopback:
| Operation | Mean | Throughput |
|---|---:|---:|
| `decode(int)` per cell | 139 ns | 7.2M ops/sec |
| `parse_tuple_payload` per row (5 cols) | 1.4 µs | 715K rows/sec |
| `SELECT 1` round-trip | ~140 µs | ~7K queries/sec |
| 1000-row SELECT | ~1.0 ms | ~990K rows/sec sustained |
| `executemany(1000)` in transaction | 32 ms | **~31,000 rows/sec** |
| Pool acquire + query + release | 295 µs | ~3.4K queries/sec |
| Cold connect (login handshake) | 11 ms | ~90 connections/sec |
**Performance gotcha**: `executemany(...)` under `autocommit=True` is **53× slower** than the same call inside a single transaction (server flushes the transaction log per row). For bulk loads, `autocommit=False` (default) + `conn.commit()` at the end. See [`docs/USAGE.md`](docs/USAGE.md) for the full performance tips section.
## Standards & guarantees
* **PEP 249** (DB-API 2.0): `connect()`, `Connection`, `Cursor`, `description`, `rowcount`, exception hierarchy
* **`paramstyle = "numeric"`** (Informix's native ESQL/C convention; `?` and `:1` both work)
* **Threadsafety = 1**: threads may share the module but not connections; the pool gives per-thread connection access. Phase 27 added a per-connection wire lock that makes accidental sharing safe (interleaved PDUs serialize correctly), but PEP 249 advice still holds — give each thread its own connection.
* **CalVer versioning**: `YYYY.MM.DD` releases. PEP 440 post-releases (`.1`, `.2`) for same-day fixes.
## Development
The full test + lint workflow is in the [Makefile](Makefile). Quick summary:
```bash
make test # 77 unit tests (no Docker)
make ifx-up && make test-integration # 231 integration tests
make bench # benchmark suite
make lint # ruff
```
For the smart-LOB tests specifically, the dev container needs additional one-time setup (blobspace + sbspace + level-0 archive). See [`docs/DECISION_LOG.md`](docs/DECISION_LOG.md) §10 for the `onspaces` / `onmode` / `ontape` commands.
## Documentation
- [**`docs/USAGE.md`**](docs/USAGE.md) — practical recipes: connections, parameter binding, type mapping, transactions, performance tips, scrollable cursors, BLOBs, async, TLS, locale/Unicode, error handling, known limitations
- [`tests/benchmarks/README.md`](tests/benchmarks/README.md) — performance baselines, headline numbers, how to run regressions
- `CHANGELOG.md` — phase-by-phase release notes
## Project history & design rationale
This driver was built incrementally across 30 phases, each with a focused scope and decision log. The reasoning trail lives in:
- [`docs/PROTOCOL_NOTES.md`](docs/PROTOCOL_NOTES.md) — byte-level SQLI wire-format reference
- [`docs/JDBC_NOTES.md`](docs/JDBC_NOTES.md) — index into the decompiled IBM JDBC driver, used as a clean-room reference
- [`docs/DECISION_LOG.md`](docs/DECISION_LOG.md) — phase-by-phase architectural decisions, with the *why* preserved
- [`docs/CAPTURES/`](docs/CAPTURES/) — annotated socat hex-dump captures
Notable architectural pivots documented in the decision log:
- **Phase 10/11** (smart-LOB read/write): used `lotofile`/`filetoblob` SQL functions + `SQ_FILE` protocol intercept instead of the heavier `SQ_FPROUTINE` + `SQ_LODATA` stack — ~3x smaller than originally projected
- **Phase 7** (logged-DB transactions): discovered Informix requires explicit `SQ_BEGIN` before each transaction in non-ANSI mode, plus `SQ_RBWORK` needs a savepoint short payload
- **Phase 16** (async): shipped thread-pool wrapping (~250 lines) instead of full I/O abstraction refactor (~2000 lines); functionally equivalent for typical FastAPI workloads
## License
MIT.