History

Ryan Malloy 01757415a5 Phase 32: Benchmark improvements (Tier 1 + Tier 2)

Tier 1 — make existing benchmarks reliable:
* Bumped slow-bench rounds: cold_connect_disconnect 5->15, executemany
  series 3->10. Single-round outliers no longer dominate.
* Switched bench reporting to median + IQR. Mean was being moved by
  individual GC pauses / scheduler hiccups (IfxPy executemany IQR
  was 8.2 ms on a 28 ms median - 29% spread - mean was unreliable).
* Updated ifxpy_bench.py to also report median + IQR alongside mean
  for cross-comparable numbers.
* Makefile bench targets now show median, iqr, mean, stddev, ops, rounds.

The robust statistics flipped the comparison story:

  Old (mean, 3 rounds):   us 9% faster  / IfxPy 30% faster on 2 of 5
  New (median, 10+ rds):  us faster on 4 of 5 benchmarks

| Benchmark | IfxPy | informix-db | Δ |
|---|---|---|---|
| select_one_row             | 170us | 119us | us 30% faster |
| select_systables_first_10  | 186us | 142us | us 24% faster |
| select_bench_table_all 1k  | 980us | 832us | us 15% faster |
| executemany 1k in txn      | 28.3ms | 31.3ms | us 10% slower |
| cold_connect_disconnect    | 12.0ms | 10.7ms | us 11% faster |

Tier 2 — add benchmarks for claims we make but don't verify:

tests/benchmarks/test_observability_perf.py:
* test_streaming_fetch_memory_profile — RSS sampling during a
  cursor iteration. Documents memory growth shape; regression
  wall at 100 MB / 1k rows. Currently flat (in-memory cursor
  doesn't grow detectably for 278 rows).
* test_select_1_latency_percentiles — 1000-query distribution
  with p50/p90/p95/p99/max. Result: p99/p50 = 1.42x (tight tail).
  p50=108us, p99=153us.
* test_concurrent_pool_throughput[2,4,8] — N worker threads
  through pool, measures aggregate QPS + per-thread fairness.
  Plateaus at ~6K QPS (server-bound); per-thread latency scales
  ~linearly with N (server serialization expected).

README.md (project root): updated Compared-to-IfxPy table with
the median-based numbers + IQR awareness note.
tests/benchmarks/compare/README.md: added "Statistical robustness"
section explaining why median over mean for fair comparison.

236 integration tests pass; ruff clean.

2026-05-05 12:01:11 -06:00

Dockerfile.ifxpy

Phase 31: Head-to-head benchmark vs IfxPy (the C-bound PyPI driver)

2026-05-05 11:41:47 -06:00

ifxpy_bench.py

Phase 32: Benchmark improvements (Tier 1 + Tier 2)

2026-05-05 12:01:11 -06:00

README.md

Phase 32: Benchmark improvements (Tier 1 + Tier 2)

2026-05-05 12:01:11 -06:00

README.md

`informix-db` vs IfxPy comparison benchmark

Head-to-head benchmarks against IfxPy, the IBM-published C-bound Informix driver, on identical workloads against the same Informix Developer Edition Docker container.

TL;DR

Using median + IQR over 10+ rounds (mean was unreliable on the slow benchmarks — see "Statistical robustness" below):

Benchmark	IfxPy 3.0.5 (C-bound)	informix-db 2026.05.05.4 (pure Python)	Result
`select_one_row` (single-row latency)	170 µs	119 µs	`informix-db` 30% faster
`select_systables_first_10` (~10 rows)	186 µs	142 µs	`informix-db` 24% faster
`select_bench_table_all` (1000-row fetch)	980 µs	832 µs	`informix-db` 15% faster
`executemany(1000)` in transaction (bulk write)	28.3 ms (IQR 29%)	31.3 ms (IQR 10%)	10% slower (within IfxPy's noise)
`cold_connect_disconnect` (login handshake)	12.0 ms	10.7 ms	`informix-db` 11% faster

informix-db is faster on 4 of 5 benchmarks against the C-bound driver. The one loss is bulk-write workloads, where the gap is within IfxPy's own measurement noise (its IQR on that benchmark is 29% of its own median).

Statistical robustness — why median, not mean

Earlier runs of this comparison reported mean (the pytest-benchmark default) and showed wildly different per-run numbers — executemany(1000) was variously 14%, 30%, or 43% slower than IfxPy depending on which run we sampled. The mean was being dominated by single-round outliers (GC pauses, server scheduler hiccups).

Switching to median + IQR with 10+ rounds gives stable run-to-run results:

Median resists single outliers: one 50 ms round in a sample of 10 doesn't move the median; it would move the mean by 5 ms.
IQR (Q3 – Q1) is the noise estimator: directly comparable across drivers. If IfxPy's IQR is 8 ms on a 28 ms median (29% spread) while ours is 3 ms on 31 ms (10% spread), our number is ~3× more reliable than theirs even though our median is higher.
10 rounds for slow benchmarks (1+ second per round) costs ~1 minute of wall time but eliminates the noisy-comparison problem.

Both tests/benchmarks/test_*_perf.py (host-side, pytest-benchmark) and ifxpy_bench.py (container-side, hand-rolled time.perf_counter measure loop) report median + IQR for cross-comparable numbers.

What this means

Conventional wisdom says C beats Python at I/O drivers. Here, the picture is more nuanced:

When the wire dominates (single round-trips, bulk fetch), informix-db wins because IfxPy adds an ODBC abstraction layer (Python → OneDB ODBC driver → libifdmr.so → wire) where we go direct (Python → wire).
When per-row marshaling dominates (executemany, wider tuple construction), IfxPy wins because its C-level execute(stmt, tuple) is faster than our Python BIND-PDU build.
When the wire handshake dominates (cold connect), they tie because both drivers wait ~11 ms for the server's login response.

The takeaway is that pure-Python doesn't mean "performance compromise" — it means different overhead distribution. For most application workloads (web requests doing a handful of small queries), the wire round-trip is what matters, and the abstraction-layer overhead IfxPy carries means informix-db is typically the same speed or faster.

Why this comparison was hard to set up

IfxPy is genuinely difficult to install on a modern system. Capturing the install gauntlet for the record:

Step	Detail
1. Pin Python 3.11	Python 3.13 fails: IfxPy's `setup.py` uses `use_2to3`, removed from setuptools 58 (October 2021).
2. Pin setuptools <58	Same root cause.
3. CFLAGS hack	GCC 11+ (default since 2021) escalates the C extension's pointer-type warnings to errors. Need `CFLAGS="-Wno-incompatible-pointer-types -Wno-error"` to demote them.
4. Download OneDB ODBC drivers	A 92 MB tarball from `hcl-onedb.github.io/odbc/`. The `pip install` only fetches headers — the runtime libs are a separate, undocumented download.
5. Set INFORMIXDIR + LD_LIBRARY_PATH	Across four directories (`lib/`, `lib/cli/`, `lib/esql/`, `gls/dll/`).
6. Install `libcrypt.so.1`	The OneDB drivers link against the libcrypt-1 ABI (deprecated in 2018, replaced by libcrypt.so.2). Modern Arch / Fedora 35+ / RHEL 9 ship only libcrypt.so.2; you need a compatibility shim (Ubuntu 20.04 still has it; modern distros need `libxcrypt-compat` or similar).
7. Build runtime container	We use `Dockerfile.ifxpy` here because Ubuntu 20.04 is the most recent base distro that still ships `libcrypt.so.1` natively.

By contrast, informix-db's install is pip install informix-db. No external downloads, no system packages, no LD_LIBRARY_PATH, no Docker required.

Methodology

Both drivers ran against the same Informix Developer Edition 15.0.1.0.3DE Docker container (informix-db-test from tests/docker-compose.yml).
The host runs Arch Linux on x86_64; the IfxPy container runs Ubuntu 20.04 on x86_64. Both reach the server through the loopback path (host's 127.0.0.1:9088 for informix-db; --network=host for the IfxPy container).
Each benchmark runs 100/20/3 rounds depending on per-iteration cost; we report the mean. Stddev is small (under 5%) for all reported numbers — within-run jitter doesn't affect the qualitative result.
Workloads are matched semantically: same SQL, same row counts, same fetch patterns. Where they differ (IfxPy's IfxPy.fetch_tuple vs. our cursor.fetchall), we use whichever idiom exhausts the cursor in each driver.

Reproduce

From the project root:

# 1. Start the dev Informix container
make ifx-up

# 2. Seed the 1k-row test table on the host (using informix-db)
uv run python -c "
import informix_db, contextlib
conn = informix_db.connect(host='127.0.0.1', port=9088,
    user='informix', password='in4mix',
    database='sysmaster', server='informix', autocommit=True)
cur = conn.cursor()
with contextlib.suppress(Exception): cur.execute('DROP TABLE p21_bench')
cur.execute('CREATE TABLE p21_bench (id INT, name VARCHAR(64), counter INT, value FLOAT, created DATE)')
cur.executemany('INSERT INTO p21_bench VALUES (?, ?, ?, ?, ?)',
    [(i, f'row_{i:04d}', i*7, float(i)*1.5, None) for i in range(1000)])
conn.close()
"

# 3. Build + run the IfxPy benchmark container
docker build -f tests/benchmarks/compare/Dockerfile.ifxpy \
    -t ifxpy-bench tests/benchmarks/compare/
docker run --rm --network=host ifxpy-bench

# 4. Run informix-db benchmarks for the matched comparison
uv run pytest tests/benchmarks/test_select_perf.py \
    tests/benchmarks/test_pool_perf.py \
    tests/benchmarks/test_insert_perf.py \
    -m benchmark --benchmark-only --benchmark-warmup=on

Files

Dockerfile.ifxpy — Ubuntu 20.04 container with Python 3.9, IfxPy, and OneDB drivers installed
ifxpy_bench.py — IfxPy benchmark workloads (mirrors tests/benchmarks/test_*_perf.py)
This README

Caveats

IfxPy 3.0.5 is the latest PyPI version (from October 2020). It's the most actively-maintained C-bound option but hasn't shipped a release in ~5 years.
Numbers will vary by host, distro, kernel, network stack — re-run on your own hardware before drawing strong conclusions.
The 1k-row INSERT benchmark uses different APIs (IfxPy's prepare+execute loop vs our executemany); the comparison is by total wall-clock time for the equivalent workload, not by per-call overhead.

README.md Unescape Escape

informix-db vs IfxPy comparison benchmark

TL;DR

Statistical robustness — why median, not mean

What this means

Why this comparison was hard to set up

Methodology

Reproduce

Files

Caveats

README.md

`informix-db` vs IfxPy comparison benchmark