Ryan Malloy 90ce035a00 Phase 21: Performance benchmarks (2026.05.04.5)
Adds tests/benchmarks/ with pytest-benchmark coverage of the hot codec
paths and end-to-end SELECT/INSERT/pool/async round-trips. Establishes
a committed baseline.json so PRs can be regression-checked at review
via --benchmark-compare.

* test_codec_perf.py (16): decode/encode_param/parse_tuple_payload
  micro-benchmarks - run without container, suitable for pre-merge CI.
* test_select_perf.py (4): SELECT round-trips - 1-row latency floor,
  10-row, 1k-row full fetch, parameterized.
* test_insert_perf.py (3): single-row INSERT, executemany 100 / 1000.
* test_pool_perf.py (3): cold connect, pool acquire/release, pool
  acquire + query + release.
* test_async_perf.py (2): async round-trip overhead, 10x concurrent.
* baseline.json: committed snapshot, 28 measurements.
* benchmark pytest marker, gated off by default.
* Makefile: bench / bench-codec / bench-save targets;
  test-integration excludes benchmarks for speed.

Headline numbers (dev container loopback):
* decode(int): 181 ns
* parse_tuple 5 cols: 2.87 µs/row
* SELECT 1 round-trip: 177 µs
* Pool acquire+query+release: 295 µs
* Cold connect: 11.2 ms (72x slower than pool)

UTF-8 decode carries no measurable cost vs iso-8859-1 - confirms
Phase 20 didn't regress anything.

Total: 69 unit + 211 integration + 28 benchmark = 308 tests.
2026-05-04 17:21:12 -06:00

80 lines
3.3 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# Benchmarks (Phase 21)
Performance baselines for `informix-db`. Two layers:
1. **Codec micro-benchmarks** (`test_codec_perf.py`) — pure CPU, no
server. These set the *ceiling* for what end-to-end can achieve.
Run with `make bench-codec`. Suitable for CI's pre-merge job.
2. **End-to-end benchmarks** — exercise the full
PREPARE → BIND → EXECUTE → FETCH → CLOSE → RELEASE round-trip.
Need an Informix container (`make ifx-up`). Run with `make bench`.
## Headline numbers (baseline 2026-05-04, x86_64 Linux, dev container on loopback)
| Operation | Mean | Ops/sec |
|-|-:|-:|
| `decode(int)` (per cell) | 181 ns | 5.5M |
| `parse_tuple_payload(5 cols)` (per row) | 2.87 µs | 350K |
| `encode_param(int)` (per param) | 103 ns | 9.7M |
| `SELECT 1` round-trip | 177 µs | 5,650 |
| Pool acquire + tiny query + release | 295 µs | 3,400 |
| **Cold connect + close** (login handshake) | **11.2 ms** | **89** |
| 1000-row SELECT * | 1.56 ms | 640 |
| INSERT (single, prepared) | 1.88 ms | 530 |
| `executemany(100 rows)` | 181 ms | 5.5 (i.e. ~550 rows/sec) |
| `executemany(1000 rows)` | 1.74 s | 0.57 (i.e. ~575 rows/sec) |
### What these tell you
- **Pool gives 72× speedup** over cold connect. If your app opens a
connection per request, fix that first.
- **Codec is not the bottleneck.** Per-row decode (2.9 µs) is 1000× faster
than wire round-trip (177 µs for `SELECT 1`). Network and server-side
cost dominate.
- **UTF-8 carries no measurable cost.** `decode_varchar_utf8` runs at
216 ns vs `decode_varchar_short` at 170 ns — the 27% delta is the
multibyte string walk inherent in UTF-8 decoding, not Phase 20 overhead.
- **`executemany` doesn't scale linearly.** 100 rows in 181 ms = 1.81 ms/row;
1000 rows in 1.74 s = 1.74 ms/row. Suggests per-row cost dominates over
PREPARE amortization. Worth investigating in Phase 21.x.
## Regression policy
`baseline.json` is committed and represents the dev-container baseline.
Compare a current run against it with:
```bash
uv run pytest tests/benchmarks/ -m benchmark --benchmark-only \
--benchmark-compare=tests/benchmarks/baseline.json \
--benchmark-compare-fail=mean:25%
```
A 25% mean-regression fails the run. Adjust the threshold per CI noise
profile. CI's loopback-network-on-shared-runner is noisier than dev
container on a quiet box — start permissive and tighten as you collect
runs.
## Updating the baseline
When you intentionally change performance (an optimization, or accept
a regression for correctness), refresh:
```bash
make bench-save # writes .results/0001_run.json
cp tests/benchmarks/.results/Linux-CPython-*/0001_run.json tests/benchmarks/baseline.json
git add tests/benchmarks/baseline.json
```
Document the change in CHANGELOG so reviewers know why the floor moved.
## Files
- `test_codec_perf.py` — codec dispatch (decode, encode_param, parse_tuple_payload)
- `test_select_perf.py` — SELECT round-trips, single + multi-row
- `test_insert_perf.py` — INSERT single + executemany throughput
- `test_pool_perf.py` — cold connect vs pool acquire/release
- `test_async_perf.py` — async-path latency + concurrent throughput
- `conftest.py` — long-lived `bench_conn` and 1k-row `bench_table` fixtures
- `baseline.json` — committed baseline for regression comparison
- `.results/` — gitignored; per-run output from `make bench-save`