informix-db/tests/benchmarks/README.md

# Benchmarks (Phase 21)

Performance baselines for `informix-db`. Two layers:

1. **Codec micro-benchmarks** (`test_codec_perf.py`) — pure CPU, no
   server. These set the *ceiling* for what end-to-end can achieve.
   Run with `make bench-codec`. Suitable for CI's pre-merge job.
2. **End-to-end benchmarks** — exercise the full
   PREPARE → BIND → EXECUTE → FETCH → CLOSE → RELEASE round-trip.
   Need an Informix container (`make ifx-up`). Run with `make bench`.

## Headline numbers (baseline 2026-05-04, x86_64 Linux, dev container on loopback)

| Operation | Mean | Ops/sec |
|-|-:|-:|
| `decode(int)` (per cell) | 181 ns | 5.5M |
| `parse_tuple_payload(5 cols)` (per row) | 2.87 µs | 350K |
| `encode_param(int)` (per param) | 103 ns | 9.7M |
| `SELECT 1` round-trip | 177 µs | 5,650 |
| Pool acquire + tiny query + release | 295 µs | 3,400 |
| **Cold connect + close** (login handshake) | **11.2 ms** | **89** |
| 1000-row SELECT * | 1.56 ms | 640 |
| INSERT (single, prepared) | 1.88 ms | 530 |
| `executemany(100 rows)` | 181 ms | 5.5 (i.e. ~550 rows/sec) |
| `executemany(1000 rows)` | 1.74 s | 0.57 (i.e. ~575 rows/sec) |

### What these tell you

- **Pool gives 72× speedup** over cold connect. If your app opens a
  connection per request, fix that first.
- **Codec is not the bottleneck.** Per-row decode (2.9 µs) is 1000× faster
  than wire round-trip (177 µs for `SELECT 1`). Network and server-side
  cost dominate.
- **UTF-8 carries no measurable cost.** `decode_varchar_utf8` runs at
  216 ns vs `decode_varchar_short` at 170 ns — the 27% delta is the
  multibyte string walk inherent in UTF-8 decoding, not Phase 20 overhead.
- **`executemany` doesn't scale linearly.** 100 rows in 181 ms = 1.81 ms/row;
  1000 rows in 1.74 s = 1.74 ms/row. Suggests per-row cost dominates over
  PREPARE amortization. Worth investigating in Phase 21.x.

## Regression policy

`baseline.json` is committed and represents the dev-container baseline.
Compare a current run against it with:

```bash
uv run pytest tests/benchmarks/ -m benchmark --benchmark-only \
    --benchmark-compare=tests/benchmarks/baseline.json \
    --benchmark-compare-fail=mean:25%
```

A 25% mean-regression fails the run. Adjust the threshold per CI noise
profile. CI's loopback-network-on-shared-runner is noisier than dev
container on a quiet box — start permissive and tighten as you collect
runs.

## Updating the baseline

When you intentionally change performance (an optimization, or accept
a regression for correctness), refresh:

```bash
make bench-save                                 # writes .results/0001_run.json
cp tests/benchmarks/.results/Linux-CPython-*/0001_run.json tests/benchmarks/baseline.json
git add tests/benchmarks/baseline.json
```

Document the change in CHANGELOG so reviewers know why the floor moved.

## Files

- `test_codec_perf.py` — codec dispatch (decode, encode_param, parse_tuple_payload)
- `test_select_perf.py` — SELECT round-trips, single + multi-row
- `test_insert_perf.py` — INSERT single + executemany throughput
- `test_pool_perf.py` — cold connect vs pool acquire/release
- `test_async_perf.py` — async-path latency + concurrent throughput
- `conftest.py` — long-lived `bench_conn` and 1k-row `bench_table` fixtures
- `baseline.json` — committed baseline for regression comparison
- `.results/` — gitignored; per-run output from `make bench-save`