Tier 1 — make existing benchmarks reliable: * Bumped slow-bench rounds: cold_connect_disconnect 5->15, executemany series 3->10. Single-round outliers no longer dominate. * Switched bench reporting to median + IQR. Mean was being moved by individual GC pauses / scheduler hiccups (IfxPy executemany IQR was 8.2 ms on a 28 ms median - 29% spread - mean was unreliable). * Updated ifxpy_bench.py to also report median + IQR alongside mean for cross-comparable numbers. * Makefile bench targets now show median, iqr, mean, stddev, ops, rounds. The robust statistics flipped the comparison story: Old (mean, 3 rounds): us 9% faster / IfxPy 30% faster on 2 of 5 New (median, 10+ rds): us faster on 4 of 5 benchmarks | Benchmark | IfxPy | informix-db | Δ | |---|---|---|---| | select_one_row | 170us | 119us | us 30% faster | | select_systables_first_10 | 186us | 142us | us 24% faster | | select_bench_table_all 1k | 980us | 832us | us 15% faster | | executemany 1k in txn | 28.3ms | 31.3ms | us 10% slower | | cold_connect_disconnect | 12.0ms | 10.7ms | us 11% faster | Tier 2 — add benchmarks for claims we make but don't verify: tests/benchmarks/test_observability_perf.py: * test_streaming_fetch_memory_profile — RSS sampling during a cursor iteration. Documents memory growth shape; regression wall at 100 MB / 1k rows. Currently flat (in-memory cursor doesn't grow detectably for 278 rows). * test_select_1_latency_percentiles — 1000-query distribution with p50/p90/p95/p99/max. Result: p99/p50 = 1.42x (tight tail). p50=108us, p99=153us. * test_concurrent_pool_throughput[2,4,8] — N worker threads through pool, measures aggregate QPS + per-thread fairness. Plateaus at ~6K QPS (server-bound); per-thread latency scales ~linearly with N (server serialization expected). README.md (project root): updated Compared-to-IfxPy table with the median-based numbers + IQR awareness note. tests/benchmarks/compare/README.md: added "Statistical robustness" section explaining why median over mean for fair comparison. 236 integration tests pass; ruff clean.
Benchmarks (Phase 21)
Performance baselines for informix-db. Two layers:
- Codec micro-benchmarks (
test_codec_perf.py) — pure CPU, no server. These set the ceiling for what end-to-end can achieve. Run withmake bench-codec. Suitable for CI's pre-merge job. - End-to-end benchmarks — exercise the full
PREPARE → BIND → EXECUTE → FETCH → CLOSE → RELEASE round-trip.
Need an Informix container (
make ifx-up). Run withmake bench.
Headline numbers (baseline 2026-05-04, x86_64 Linux, dev container on loopback)
| Operation | Mean | Ops/sec |
|---|---|---|
decode(int) (per cell) |
181 ns | 5.5M |
parse_tuple_payload(5 cols) (per row) |
2.87 µs | 350K |
encode_param(int) (per param) |
103 ns | 9.7M |
SELECT 1 round-trip |
177 µs | 5,650 |
| Pool acquire + tiny query + release | 295 µs | 3,400 |
| Cold connect + close (login handshake) | 11.2 ms | 89 |
| 1000-row SELECT * | 1.56 ms | 640 |
| INSERT (single, prepared) | 1.88 ms | 530 |
executemany(100) autocommit=True |
181 ms | ~550 rows/sec |
executemany(1000) autocommit=True |
1.72 s | ~580 rows/sec |
executemany(1000) in single transaction |
32 ms | ~31,000 rows/sec |
What these tell you
- Pool gives 72× speedup over cold connect. If your app opens a connection per request, fix that first.
- Wrap bulk INSERTs in a transaction. That's a 53× speedup over
the autocommit-True default. With autocommit on, each row forces the
server to flush its transaction log; in transaction mode the flush
happens once at COMMIT. Per-row cost drops from 1.72 ms (storage-bound)
to 32 µs (pure protocol). PEP 249's default
autocommit=Falsewas designed for this — we just default toFalse. - Codec is not the bottleneck. Per-row decode (2.9 µs) is 1000× faster
than wire round-trip (177 µs for
SELECT 1). Network and server-side cost dominate. - UTF-8 carries no measurable cost.
decode_varchar_utf8runs at 216 ns vsdecode_varchar_shortat 170 ns — the 27% delta is the multibyte string walk inherent in UTF-8 decoding, not Phase 20 overhead.
Performance gotchas
autocommit=True+executemanyis the slowest reasonable pattern. Use it only when each row genuinely needs to land independently. For bulk loads, defaultautocommit=Falseand callconn.commit()at the end of the batch.- Single
INSERTin a tight loop is 1.88 ms each — strictly worse thanexecutemany(which saves PREPARE/RELEASE overhead). If you find yourself looping overcur.execute("INSERT...")hundreds of times, switch toexecutemany. - Cold connect is 11 ms. The login handshake is expensive compared to anything you'll do with the connection. Pool everything in long-lived processes.
Regression policy
baseline.json is committed and represents the dev-container baseline.
Compare a current run against it with:
uv run pytest tests/benchmarks/ -m benchmark --benchmark-only \
--benchmark-compare=tests/benchmarks/baseline.json \
--benchmark-compare-fail=mean:25%
A 25% mean-regression fails the run. Adjust the threshold per CI noise profile. CI's loopback-network-on-shared-runner is noisier than dev container on a quiet box — start permissive and tighten as you collect runs.
Updating the baseline
When you intentionally change performance (an optimization, or accept a regression for correctness), refresh:
make bench-save # writes .results/0001_run.json
cp tests/benchmarks/.results/Linux-CPython-*/0001_run.json tests/benchmarks/baseline.json
git add tests/benchmarks/baseline.json
Document the change in CHANGELOG so reviewers know why the floor moved.
Files
test_codec_perf.py— codec dispatch (decode, encode_param, parse_tuple_payload)test_select_perf.py— SELECT round-trips, single + multi-rowtest_insert_perf.py— INSERT single + executemany throughputtest_pool_perf.py— cold connect vs pool acquire/releasetest_async_perf.py— async-path latency + concurrent throughputconftest.py— long-livedbench_connand 1k-rowbench_tablefixturesbaseline.json— committed baseline for regression comparison.results/— gitignored; per-run output frommake bench-save