informix-db/README.md
Ryan Malloy 362ecb3d63 Phase 33: Pipelined executemany - 2.85x faster bulk insert (2026.05.05.6)
The serial-loop executemany paid one wire round-trip per row (~30us/
row on loopback). It was the one benchmark where IfxPy beat us in
the comparison work - 10% slower at executemany(1000) in txn.

Phase 33 pipelines the BIND+EXECUTE PDUs: build all N PDUs, send
them back-to-back, then drain all N responses. Eliminates per-row
RTT entirely.

Performance impact:
* executemany(1000) in txn:   31.3 ms -> 11.0 ms (2.85x faster)
* executemany(100) autocommit: 173 ms -> 154 ms (11% faster)
* executemany(1000) autocommit: 1740 ms -> 1590 ms (9% faster)

(Autocommit gets smaller wins because server-side log flushes
dominate - Phase 21.1's "autocommit cliff".)

IfxPy comparison flipped: us 10% slower -> us 2.05x faster on bulk
inserts. We now win all 5 head-to-head benchmarks against the C-bound
driver.

Margaret Hamilton review surfaced one CRITICAL concern (C1) - the
pipeline assumes Informix sends N responses for N pipelined PDUs
even when one fails. If the server cut the stream short, the drain
loop would deadlock on the next read.

Verified by 3 new integration tests in tests/test_executemany_pipeline.py:
* test_pipelined_executemany_mid_batch_constraint_violation (row 500/1000)
* test_pipelined_executemany_first_row_fails (row 0/100)
* test_pipelined_executemany_last_row_fails (row 99/100)

All confirm Informix sends N responses; wire stays aligned; connection
is usable after.

Plus 4 lower-priority fixes Hamilton recommended:
* H1: documented _raise_sq_err self-drains-SQ_EOT invariant + tripwire
* H2: docstring warning about O(N) lock duration; chunk for huge batches
* M1: prepend row-index to exception message rather than reformat
* M2: documented sendall-no-timeout caveat on hostile networks

77 unit + 239 integration + 33 benchmark = 349 tests; ruff clean.

Note: Phase 32 (Tier 1+2 benchmarks) was tagged without bumping
pyproject.toml's version string. .5 was git-tag-only; .6 is the next
published version increment.
2026-05-05 12:26:15 -06:00

11 KiB
Raw Blame History

informix-db

Pure-Python driver for IBM Informix IDS, speaking the SQLI wire protocol over raw sockets. No IBM Client SDK. No JVM. No native libraries. PEP 249 compliant; sync + async APIs; built-in connection pool; TLS support.

To our knowledge this is the first pure-socket Informix driver in any language — every other Informix driver (IfxPy, the legacy informixdb, ODBC bridges, JPype/JDBC, Perl DBD::Informix) wraps either IBM's CSDK or the JDBC JAR.

pip install informix-db

Requires Python ≥ 3.10.

Status

Production ready. Every finding from a system-wide failure-mode audit (data correctness, wire safety, resource leaks, concurrency, async cancellation) has been addressed:

Severity Finding Status
Critical Pool returns connections with open transactions Fixed (Phase 26)
Critical Unsynchronized wire path → PDU interleaving Fixed (Phase 27) — per-connection wire lock
High Async cancellation leaks running workers onto recycled connections Fixed (Phase 27)
High _raise_sq_err bare-except masks wire desync Fixed (Phase 28)
High Cursor finalizers — server-side resources leak on mid-fetch raise Fixed (Phase 28+29)
Medium 5 hardening items Fixed (Phase 28+30)

0 critical, 0 high, 0 medium audit findings remain. Every architectural change went through a Margaret Hamilton-style review focused on silent-failure modes, recovery paths, and documented invariants. Each documented invariant is paired with either a runtime guard or a CI tripwire test.

Test coverage: 300+ tests across unit / integration / benchmark suites. Integration tests run against the official IBM Informix Developer Edition Docker image (15.0.1.0.3DE).

Quick start

import informix_db

with informix_db.connect(
    host="db.example.com", port=9088,
    user="informix", password="...",
    database="mydb", server="informix",
) as conn:
    cur = conn.cursor()
    cur.execute("SELECT id, name FROM users WHERE id = ?", (42,))
    user_id, name = cur.fetchone()

Async (FastAPI / aiohttp / asyncio)

import asyncio
from informix_db import aio

async def main():
    pool = await aio.create_pool(
        host="db.example.com", user="informix", password="...",
        database="mydb",
        min_size=1, max_size=10,
    )
    async with pool.connection() as conn:
        cur = await conn.cursor()
        await cur.execute("SELECT id, name FROM users WHERE id = ?", (42,))
        row = await cur.fetchone()
    await pool.close()

asyncio.run(main())

Connection pool (sync)

import informix_db

pool = informix_db.create_pool(
    host="db.example.com", user="informix", password="...",
    database="mydb",
    min_size=1, max_size=10, acquire_timeout=5.0,
)

with pool.connection() as conn:
    cur = conn.cursor()
    cur.execute("...")

pool.close()

TLS

import ssl

# Production: bring your own context
ctx = ssl.create_default_context(cafile="/path/to/ca.pem")
informix_db.connect(host="...", port=9089, ..., tls=ctx)

# Dev / self-signed: tls=True disables verification
informix_db.connect(host="127.0.0.1", port=9089, ..., tls=True)

Informix uses dedicated TLS-enabled listener ports (configured server-side in sqlhosts) rather than STARTTLS upgrade — point port at the TLS listener (often 9089) when tls is enabled.

Type support

SQL type Python type
SMALLINT / INT / BIGINT / SERIAL int
FLOAT / SMALLFLOAT float
DECIMAL(p,s) / MONEY decimal.Decimal
CHAR / VARCHAR / NCHAR / NVCHAR / LVARCHAR str
BOOLEAN bool
DATE datetime.date
DATETIME YEAR TO ... datetime.datetime / datetime.time / datetime.date
INTERVAL DAY TO FRACTION datetime.timedelta
INTERVAL YEAR TO MONTH informix_db.IntervalYM
BYTE / TEXT (legacy in-row blobs) bytes / str
BLOB / CLOB (smart-LOBs) informix_db.BlobLocator / informix_db.ClobLocator (read via cursor.read_blob_column, write via cursor.write_blob_column)
ROW(...) informix_db.RowValue
SET(...) / MULTISET(...) / LIST(...) informix_db.CollectionValue
NULL None

Smart-LOB (BLOB / CLOB) read & write

# Read: returns the actual bytes
data = cur.read_blob_column(
    "SELECT data FROM photos WHERE id = ?", (42,)
)

# Write: BLOB_PLACEHOLDER token marks where the BLOB goes
cur.write_blob_column(
    "INSERT INTO photos VALUES (?, BLOB_PLACEHOLDER)",
    blob_data=jpeg_bytes,
    params=(42,),
)

Both work end-to-end in pure Python via the lotofile / filetoblob server functions intercepted at the SQ_FILE (98) wire-protocol level — no native machinery anywhere in the thread of execution. See docs/DECISION_LOG.md §1011 for the architecture pivot that made this possible.

Direct stored-procedure invocation (fast-path)

# Cleanly close a smart-LOB descriptor opened via SQL
result = conn.fast_path_call(
    "function informix.ifx_lo_close(integer)", lofd
)
# result == [0] on success

The fast-path RPC (SQ_FPROUTINE / SQ_EXFPROUTINE) bypasses PREPARE → EXECUTE → FETCH for direct UDF/SPL calls. Routine handles are cached per-connection, so repeated calls to the same function take a single round-trip.

Server compatibility

Tested against IBM Informix Dynamic Server 15.0.1.0.3DE (the official icr.io/informix/informix-developer-database Docker image). The wire protocol is stable across modern Informix versions; should work against 12.10+ unmodified.

For features that need server-side configuration (smart-LOBs, logged transactions), see docs/DECISION_LOG.md:

  • Phase 7 — logged-DB transactions
  • Phase 8 — BYTE/TEXT (needs blobspace)
  • Phase 10/11 — BLOB/CLOB (needs sbspace + SBSPACENAME config + level-0 archive)

Performance

Single-connection benchmarks against the dev container on loopback:

Operation Mean Throughput
decode(int) per cell 139 ns 7.2M ops/sec
parse_tuple_payload per row (5 cols) 1.4 µs 715K rows/sec
SELECT 1 round-trip ~140 µs ~7K queries/sec
1000-row SELECT ~1.0 ms ~990K rows/sec sustained
executemany(1000) in transaction 32 ms ~31,000 rows/sec
Pool acquire + query + release 295 µs ~3.4K queries/sec
Cold connect (login handshake) 11 ms ~90 connections/sec

Performance gotcha: executemany(...) under autocommit=True is 53× slower than the same call inside a single transaction (server flushes the transaction log per row). For bulk loads, autocommit=False (default) + conn.commit() at the end. See docs/USAGE.md for the full performance tips section.

Compared to IfxPy (the C-bound PyPI driver)

Head-to-head benchmarks against IfxPy on identical workloads, same Informix server, matched conditions. Using median + IQR over 10+ rounds to resist outlier-round noise:

Benchmark IfxPy 3.0.5 (C-bound) informix-db (pure Python) Result
Single-row SELECT round-trip 118 µs 114 µs informix-db 3% faster
~10-row server-side query 164 µs 159 µs informix-db 3% faster
1000-row SELECT (full fetch) 984 µs 891 µs informix-db 9% faster
executemany(1000) in transaction 21.4 ms 10.4 ms informix-db 2.05× faster
Cold connect (login handshake) 11.0 ms 10.4 ms informix-db 5% faster

informix-db wins on all 5 benchmarks against the C-bound driver, including a 2× win on bulk inserts.

Why pure-Python wins the round-trip-bound work: IfxPy's code path is Python → OneDB ODBC driver → libifdmr.so → wire. Ours is Python → wire. The abstraction-layer overhead IfxPy carries on every call costs more than the C-vs-Python codec gap saves.

Why we win bulk inserts dramatically: executemany pipelines all N BIND+EXECUTE PDUs to the wire before draining responses (Phase 33), eliminating the per-row round-trip that the older serial loop incurred. IfxPy still does one synchronous round-trip per row.

Full methodology, IQR caveats, install gauntlet, and reproduction in tests/benchmarks/compare/README.md.

A note on IfxPy's install gauntlet: getting it to run on a modern system requires Python ≤ 3.11, setuptools <58, permissive CFLAGS, manual download of a 92 MB ODBC tarball, four LD_LIBRARY_PATH directories, and libcrypt.so.1 (deprecated 2018, missing on Arch / Fedora 35+ / RHEL 9). informix-db's install: pip install informix-db.

Standards & guarantees

  • PEP 249 (DB-API 2.0): connect(), Connection, Cursor, description, rowcount, exception hierarchy
  • paramstyle = "numeric" (Informix's native ESQL/C convention; ? and :1 both work)
  • Threadsafety = 1: threads may share the module but not connections; the pool gives per-thread connection access. Phase 27 added a per-connection wire lock that makes accidental sharing safe (interleaved PDUs serialize correctly), but PEP 249 advice still holds — give each thread its own connection.
  • CalVer versioning: YYYY.MM.DD releases. PEP 440 post-releases (.1, .2) for same-day fixes.

Development

The full test + lint workflow is in the Makefile. Quick summary:

make test                     # 77 unit tests (no Docker)
make ifx-up && make test-integration   # 231 integration tests
make bench                    # benchmark suite
make lint                     # ruff

For the smart-LOB tests specifically, the dev container needs additional one-time setup (blobspace + sbspace + level-0 archive). See docs/DECISION_LOG.md §10 for the onspaces / onmode / ontape commands.

Documentation

  • docs/USAGE.md — practical recipes: connections, parameter binding, type mapping, transactions, performance tips, scrollable cursors, BLOBs, async, TLS, locale/Unicode, error handling, known limitations
  • tests/benchmarks/README.md — performance baselines, headline numbers, how to run regressions
  • CHANGELOG.md — phase-by-phase release notes

Project history & design rationale

This driver was built incrementally across 30 phases, each with a focused scope and decision log. The reasoning trail lives in:

Notable architectural pivots documented in the decision log:

  • Phase 10/11 (smart-LOB read/write): used lotofile/filetoblob SQL functions + SQ_FILE protocol intercept instead of the heavier SQ_FPROUTINE + SQ_LODATA stack — ~3x smaller than originally projected
  • Phase 7 (logged-DB transactions): discovered Informix requires explicit SQ_BEGIN before each transaction in non-ANSI mode, plus SQ_RBWORK needs a savepoint short payload
  • Phase 16 (async): shipped thread-pool wrapping (~250 lines) instead of full I/O abstraction refactor (~2000 lines); functionally equivalent for typical FastAPI workloads

License

MIT.