Ryan Malloy 90ce035a00 Phase 21: Performance benchmarks (2026.05.04.5)

Adds tests/benchmarks/ with pytest-benchmark coverage of the hot codec
paths and end-to-end SELECT/INSERT/pool/async round-trips. Establishes
a committed baseline.json so PRs can be regression-checked at review
via --benchmark-compare.

* test_codec_perf.py (16): decode/encode_param/parse_tuple_payload
  micro-benchmarks - run without container, suitable for pre-merge CI.
* test_select_perf.py (4): SELECT round-trips - 1-row latency floor,
  10-row, 1k-row full fetch, parameterized.
* test_insert_perf.py (3): single-row INSERT, executemany 100 / 1000.
* test_pool_perf.py (3): cold connect, pool acquire/release, pool
  acquire + query + release.
* test_async_perf.py (2): async round-trip overhead, 10x concurrent.
* baseline.json: committed snapshot, 28 measurements.
* benchmark pytest marker, gated off by default.
* Makefile: bench / bench-codec / bench-save targets;
  test-integration excludes benchmarks for speed.

Headline numbers (dev container loopback):
* decode(int): 181 ns
* parse_tuple 5 cols: 2.87 µs/row
* SELECT 1 round-trip: 177 µs
* Pool acquire+query+release: 295 µs
* Cold connect: 11.2 ms (72x slower than pool)

UTF-8 decode carries no measurable cost vs iso-8859-1 - confirms
Phase 20 didn't regress anything.

Total: 69 unit + 211 integration + 28 benchmark = 308 tests.

2026-05-04 17:21:12 -06:00

3.3 KiB

Raw Blame History

Benchmarks (Phase 21)

Performance baselines for informix-db. Two layers:

Codec micro-benchmarks (test_codec_perf.py) — pure CPU, no server. These set the ceiling for what end-to-end can achieve. Run with make bench-codec. Suitable for CI's pre-merge job.
End-to-end benchmarks — exercise the full PREPARE → BIND → EXECUTE → FETCH → CLOSE → RELEASE round-trip. Need an Informix container (make ifx-up). Run with make bench.

Headline numbers (baseline 2026-05-04, x86_64 Linux, dev container on loopback)

Operation	Mean	Ops/sec
`decode(int)` (per cell)	181 ns	5.5M
`parse_tuple_payload(5 cols)` (per row)	2.87 µs	350K
`encode_param(int)` (per param)	103 ns	9.7M
`SELECT 1` round-trip	177 µs	5,650
Pool acquire + tiny query + release	295 µs	3,400
Cold connect + close (login handshake)	11.2 ms	89
1000-row SELECT *	1.56 ms	640
INSERT (single, prepared)	1.88 ms	530
`executemany(100 rows)`	181 ms	5.5 (i.e. ~550 rows/sec)
`executemany(1000 rows)`	1.74 s	0.57 (i.e. ~575 rows/sec)

What these tell you

Pool gives 72× speedup over cold connect. If your app opens a connection per request, fix that first.
Codec is not the bottleneck. Per-row decode (2.9 µs) is 1000× faster than wire round-trip (177 µs for SELECT 1). Network and server-side cost dominate.
UTF-8 carries no measurable cost. decode_varchar_utf8 runs at 216 ns vs decode_varchar_short at 170 ns — the 27% delta is the multibyte string walk inherent in UTF-8 decoding, not Phase 20 overhead.
executemany doesn't scale linearly. 100 rows in 181 ms = 1.81 ms/row; 1000 rows in 1.74 s = 1.74 ms/row. Suggests per-row cost dominates over PREPARE amortization. Worth investigating in Phase 21.x.

Regression policy

baseline.json is committed and represents the dev-container baseline. Compare a current run against it with:

uv run pytest tests/benchmarks/ -m benchmark --benchmark-only \
    --benchmark-compare=tests/benchmarks/baseline.json \
    --benchmark-compare-fail=mean:25%

A 25% mean-regression fails the run. Adjust the threshold per CI noise profile. CI's loopback-network-on-shared-runner is noisier than dev container on a quiet box — start permissive and tighten as you collect runs.

Updating the baseline

When you intentionally change performance (an optimization, or accept a regression for correctness), refresh:

make bench-save                                 # writes .results/0001_run.json
cp tests/benchmarks/.results/Linux-CPython-*/0001_run.json tests/benchmarks/baseline.json
git add tests/benchmarks/baseline.json

Document the change in CHANGELOG so reviewers know why the floor moved.

Files

test_codec_perf.py — codec dispatch (decode, encode_param, parse_tuple_payload)
test_select_perf.py — SELECT round-trips, single + multi-row
test_insert_perf.py — INSERT single + executemany throughput
test_pool_perf.py — cold connect vs pool acquire/release
test_async_perf.py — async-path latency + concurrent throughput
conftest.py — long-lived bench_conn and 1k-row bench_table fixtures
baseline.json — committed baseline for regression comparison
.results/ — gitignored; per-run output from make bench-save

3.3 KiB Raw Blame History Unescape Escape