Ryan Malloy 270155d2de Phase 36: IfxPy scaling comparison + honest comparison numbers (2026.05.05.9)
Extends the IfxPy comparison bench script with scaling workloads
(1k/10k/100k rows for both executemany and SELECT). Re-runs the
full comparison with consistent measurement methodology and updates
the README with the actually-correct numbers.

Earlier comparison runs reported informix-db winning all 5
benchmarks. Re-running select_bench_table_all with consistent
measurement gives 3.04 ms, not the 891 us I cited earlier - a
3.4x discrepancy attributable to noisy warmup + small-fixture
artifacts. The "we win everything" framing was wrong.

Corrected comparison reveals two clear stories:

Bulk-insert: pure-Python wins 1.6x at scale.
  executemany(10k):  IfxPy 259ms  -> us 161ms (1.6x faster)
  executemany(100k): IfxPy 2376ms -> us 1487ms (1.6x faster)
Reason: Phase 33's pipelining eliminates per-row RTT. IfxPy's
per-call API can't pipeline.

Large-fetch: IfxPy wins 2.3-2.4x at scale.
  SELECT 1k rows:   IfxPy 1.2ms  / us 2.7ms (IfxPy 2.3x)
  SELECT 10k rows:  IfxPy 11.3ms / us 25.8ms (IfxPy 2.3x)
  SELECT 100k rows: IfxPy 112ms  / us 271ms (IfxPy 2.4x)
Reason: C-level fetch_tuple at ~1.1us/row beats Python
parse_tuple_payload at ~2.7us/row. Real C-vs-Python codec gap
showing up at scale.

For everyday workloads (single SELECT in a request, INSERT a
handful of rows), drivers are within 5-25%. For workloads where
the gap widens, direction depends on what you're doing - bulk-
write favors us, bulk-read favors IfxPy.

README's "Compared to IfxPy" section rewritten with the corrected
numbers and an honest "when to prefer which" subsection.
tests/benchmarks/compare/README.md mirror updated.

Net narrative: a "faster at bulk-write, slower at bulk-read,
comparable elsewhere" comparison story is more honest and more
durable than a "we win everything" claim that would have collapsed
the first time a user ran their own benchmark.

Side note (lint): one ambiguous unicode `×` in cursors.py replaced
with `x`.

Phase 37 ticket: parse_tuple_payload is the bottleneck at scale.
Closing the 1.6 us/row gap to IfxPy would make us competitive on
bulk-fetch too. Possible approaches: Cython codec, deeper inlining,
per-column dispatch pre-bake.
2026-05-05 12:44:52 -06:00
..

Benchmarks (Phase 21)

Performance baselines for informix-db. Two layers:

  1. Codec micro-benchmarks (test_codec_perf.py) — pure CPU, no server. These set the ceiling for what end-to-end can achieve. Run with make bench-codec. Suitable for CI's pre-merge job.
  2. End-to-end benchmarks — exercise the full PREPARE → BIND → EXECUTE → FETCH → CLOSE → RELEASE round-trip. Need an Informix container (make ifx-up). Run with make bench.

Headline numbers (baseline 2026-05-04, x86_64 Linux, dev container on loopback)

Operation Mean Ops/sec
decode(int) (per cell) 181 ns 5.5M
parse_tuple_payload(5 cols) (per row) 2.87 µs 350K
encode_param(int) (per param) 103 ns 9.7M
SELECT 1 round-trip 177 µs 5,650
Pool acquire + tiny query + release 295 µs 3,400
Cold connect + close (login handshake) 11.2 ms 89
1000-row SELECT * 1.56 ms 640
INSERT (single, prepared) 1.88 ms 530
executemany(100) autocommit=True 181 ms ~550 rows/sec
executemany(1000) autocommit=True 1.72 s ~580 rows/sec
executemany(1000) in single transaction 32 ms ~31,000 rows/sec

What these tell you

  • Pool gives 72× speedup over cold connect. If your app opens a connection per request, fix that first.
  • Wrap bulk INSERTs in a transaction. That's a 53× speedup over the autocommit-True default. With autocommit on, each row forces the server to flush its transaction log; in transaction mode the flush happens once at COMMIT. Per-row cost drops from 1.72 ms (storage-bound) to 32 µs (pure protocol). PEP 249's default autocommit=False was designed for this — we just default to False.
  • Codec is not the bottleneck. Per-row decode (2.9 µs) is 1000× faster than wire round-trip (177 µs for SELECT 1). Network and server-side cost dominate.
  • UTF-8 carries no measurable cost. decode_varchar_utf8 runs at 216 ns vs decode_varchar_short at 170 ns — the 27% delta is the multibyte string walk inherent in UTF-8 decoding, not Phase 20 overhead.

Performance gotchas

  • autocommit=True + executemany is the slowest reasonable pattern. Use it only when each row genuinely needs to land independently. For bulk loads, default autocommit=False and call conn.commit() at the end of the batch.
  • Single INSERT in a tight loop is 1.88 ms each — strictly worse than executemany (which saves PREPARE/RELEASE overhead). If you find yourself looping over cur.execute("INSERT...") hundreds of times, switch to executemany.
  • Cold connect is 11 ms. The login handshake is expensive compared to anything you'll do with the connection. Pool everything in long-lived processes.

Regression policy

baseline.json is committed and represents the dev-container baseline. Compare a current run against it with:

uv run pytest tests/benchmarks/ -m benchmark --benchmark-only \
    --benchmark-compare=tests/benchmarks/baseline.json \
    --benchmark-compare-fail=mean:25%

A 25% mean-regression fails the run. Adjust the threshold per CI noise profile. CI's loopback-network-on-shared-runner is noisier than dev container on a quiet box — start permissive and tighten as you collect runs.

Updating the baseline

When you intentionally change performance (an optimization, or accept a regression for correctness), refresh:

make bench-save                                 # writes .results/0001_run.json
cp tests/benchmarks/.results/Linux-CPython-*/0001_run.json tests/benchmarks/baseline.json
git add tests/benchmarks/baseline.json

Document the change in CHANGELOG so reviewers know why the floor moved.

Files

  • test_codec_perf.py — codec dispatch (decode, encode_param, parse_tuple_payload)
  • test_select_perf.py — SELECT round-trips, single + multi-row
  • test_insert_perf.py — INSERT single + executemany throughput
  • test_pool_perf.py — cold connect vs pool acquire/release
  • test_async_perf.py — async-path latency + concurrent throughput
  • conftest.py — long-lived bench_conn and 1k-row bench_table fixtures
  • baseline.json — committed baseline for regression comparison
  • .results/ — gitignored; per-run output from make bench-save