History

Ryan Malloy 270155d2de Phase 36: IfxPy scaling comparison + honest comparison numbers (2026.05.05.9)

Extends the IfxPy comparison bench script with scaling workloads
(1k/10k/100k rows for both executemany and SELECT). Re-runs the
full comparison with consistent measurement methodology and updates
the README with the actually-correct numbers.

Earlier comparison runs reported informix-db winning all 5
benchmarks. Re-running select_bench_table_all with consistent
measurement gives 3.04 ms, not the 891 us I cited earlier - a
3.4x discrepancy attributable to noisy warmup + small-fixture
artifacts. The "we win everything" framing was wrong.

Corrected comparison reveals two clear stories:

Bulk-insert: pure-Python wins 1.6x at scale.
  executemany(10k):  IfxPy 259ms  -> us 161ms (1.6x faster)
  executemany(100k): IfxPy 2376ms -> us 1487ms (1.6x faster)
Reason: Phase 33's pipelining eliminates per-row RTT. IfxPy's
per-call API can't pipeline.

Large-fetch: IfxPy wins 2.3-2.4x at scale.
  SELECT 1k rows:   IfxPy 1.2ms  / us 2.7ms (IfxPy 2.3x)
  SELECT 10k rows:  IfxPy 11.3ms / us 25.8ms (IfxPy 2.3x)
  SELECT 100k rows: IfxPy 112ms  / us 271ms (IfxPy 2.4x)
Reason: C-level fetch_tuple at ~1.1us/row beats Python
parse_tuple_payload at ~2.7us/row. Real C-vs-Python codec gap
showing up at scale.

For everyday workloads (single SELECT in a request, INSERT a
handful of rows), drivers are within 5-25%. For workloads where
the gap widens, direction depends on what you're doing - bulk-
write favors us, bulk-read favors IfxPy.

README's "Compared to IfxPy" section rewritten with the corrected
numbers and an honest "when to prefer which" subsection.
tests/benchmarks/compare/README.md mirror updated.

Net narrative: a "faster at bulk-write, slower at bulk-read,
comparable elsewhere" comparison story is more honest and more
durable than a "we win everything" claim that would have collapsed
the first time a user ran their own benchmark.

Side note (lint): one ambiguous unicode `×` in cursors.py replaced
with `x`.

Phase 37 ticket: parse_tuple_payload is the bottleneck at scale.
Closing the 1.6 us/row gap to IfxPy would make us competitive on
bulk-fetch too. Possible approaches: Cython codec, deeper inlining,
per-column dispatch pre-bake.

2026-05-05 12:44:52 -06:00

compare

Phase 36: IfxPy scaling comparison + honest comparison numbers (2026.05.05.9)

2026-05-05 12:44:52 -06:00

__init__.py

Phase 21: Performance benchmarks (2026.05.04.5)

2026-05-04 17:21:12 -06:00

baseline.json

Phase 25: Branch reorder + invariant tripwires (2026.05.04.10)

2026-05-04 23:34:05 -06:00

conftest.py

Phase 21: Performance benchmarks (2026.05.04.5)

2026-05-04 17:21:12 -06:00

README.md

Phase 21.1: executemany perf - it was the autocommit cliff (2026.05.04.6)

2026-05-04 17:26:16 -06:00

test_async_perf.py

Phase 21: Performance benchmarks (2026.05.04.5)

2026-05-04 17:21:12 -06:00

test_codec_perf.py

Phase 21: Performance benchmarks (2026.05.04.5)

2026-05-04 17:21:12 -06:00

test_insert_perf.py

Phase 32: Benchmark improvements (Tier 1 + Tier 2)

2026-05-05 12:01:11 -06:00

test_observability_perf.py

Phase 32: Benchmark improvements (Tier 1 + Tier 2)

2026-05-05 12:01:11 -06:00

test_pool_perf.py

Phase 32: Benchmark improvements (Tier 1 + Tier 2)

2026-05-05 12:01:11 -06:00

test_scaling_perf.py

Phase 34: Scaling benchmarks (1k/10k/100k rows; 5/20/50 cols) (2026.05.05.8)

2026-05-05 12:38:07 -06:00

test_select_perf.py

Phase 21: Performance benchmarks (2026.05.04.5)

2026-05-04 17:21:12 -06:00

README.md

Benchmarks (Phase 21)

Performance baselines for informix-db. Two layers:

Codec micro-benchmarks (test_codec_perf.py) — pure CPU, no server. These set the ceiling for what end-to-end can achieve. Run with make bench-codec. Suitable for CI's pre-merge job.
End-to-end benchmarks — exercise the full PREPARE → BIND → EXECUTE → FETCH → CLOSE → RELEASE round-trip. Need an Informix container (make ifx-up). Run with make bench.

Headline numbers (baseline 2026-05-04, x86_64 Linux, dev container on loopback)

Operation	Mean	Ops/sec
`decode(int)` (per cell)	181 ns	5.5M
`parse_tuple_payload(5 cols)` (per row)	2.87 µs	350K
`encode_param(int)` (per param)	103 ns	9.7M
`SELECT 1` round-trip	177 µs	5,650
Pool acquire + tiny query + release	295 µs	3,400
Cold connect + close (login handshake)	11.2 ms	89
1000-row SELECT *	1.56 ms	640
INSERT (single, prepared)	1.88 ms	530
`executemany(100)` autocommit=True	181 ms	~550 rows/sec
`executemany(1000)` autocommit=True	1.72 s	~580 rows/sec
`executemany(1000)` in single transaction	32 ms	~31,000 rows/sec

What these tell you

Pool gives 72× speedup over cold connect. If your app opens a connection per request, fix that first.
Wrap bulk INSERTs in a transaction. That's a 53× speedup over the autocommit-True default. With autocommit on, each row forces the server to flush its transaction log; in transaction mode the flush happens once at COMMIT. Per-row cost drops from 1.72 ms (storage-bound) to 32 µs (pure protocol). PEP 249's default autocommit=False was designed for this — we just default to False.
Codec is not the bottleneck. Per-row decode (2.9 µs) is 1000× faster than wire round-trip (177 µs for SELECT 1). Network and server-side cost dominate.
UTF-8 carries no measurable cost. decode_varchar_utf8 runs at 216 ns vs decode_varchar_short at 170 ns — the 27% delta is the multibyte string walk inherent in UTF-8 decoding, not Phase 20 overhead.

Performance gotchas

autocommit=True + executemany is the slowest reasonable pattern. Use it only when each row genuinely needs to land independently. For bulk loads, default autocommit=False and call conn.commit() at the end of the batch.
Single INSERT in a tight loop is 1.88 ms each — strictly worse than executemany (which saves PREPARE/RELEASE overhead). If you find yourself looping over cur.execute("INSERT...") hundreds of times, switch to executemany.
Cold connect is 11 ms. The login handshake is expensive compared to anything you'll do with the connection. Pool everything in long-lived processes.

Regression policy

baseline.json is committed and represents the dev-container baseline. Compare a current run against it with:

uv run pytest tests/benchmarks/ -m benchmark --benchmark-only \
    --benchmark-compare=tests/benchmarks/baseline.json \
    --benchmark-compare-fail=mean:25%

A 25% mean-regression fails the run. Adjust the threshold per CI noise profile. CI's loopback-network-on-shared-runner is noisier than dev container on a quiet box — start permissive and tighten as you collect runs.

Updating the baseline

When you intentionally change performance (an optimization, or accept a regression for correctness), refresh:

make bench-save                                 # writes .results/0001_run.json
cp tests/benchmarks/.results/Linux-CPython-*/0001_run.json tests/benchmarks/baseline.json
git add tests/benchmarks/baseline.json

Document the change in CHANGELOG so reviewers know why the floor moved.

Files

test_codec_perf.py — codec dispatch (decode, encode_param, parse_tuple_payload)
test_select_perf.py — SELECT round-trips, single + multi-row
test_insert_perf.py — INSERT single + executemany throughput
test_pool_perf.py — cold connect vs pool acquire/release
test_async_perf.py — async-path latency + concurrent throughput
conftest.py — long-lived bench_conn and 1k-row bench_table fixtures
baseline.json — committed baseline for regression comparison
.results/ — gitignored; per-run output from make bench-save

README.md Unescape Escape

Benchmarks (Phase 21)

Headline numbers (baseline 2026-05-04, x86_64 Linux, dev container on loopback)

What these tell you

Performance gotchas

Regression policy

Updating the baseline

Files

README.md