Investigation of the Phase 21 baseline finding that executemany(N) cost scaled linearly per-row (1.74 ms x N) regardless of batch size. Root cause: every autocommit=True INSERT forces a server-side transaction-log flush. Not a wire-protocol bug. Numbers: * executemany(1000) autocommit=True: 1.72 s (1.72 ms/row) * executemany(1000) in single txn: 32 ms (32 us/row) 53x speedup from changing the transaction boundary, not the driver. Pure protocol overhead is ~32 us/row -> ~31K rows/sec sustained throughput on a single connection. Comparable to pg8000. Added test_executemany_1000_rows_in_txn benchmark to make this visible. Updated README headline numbers and added a "Performance gotchas" section explaining when autocommit=False matters. Decision: don't pipeline. The remaining 32 us is already excellent; the autocommit gotcha is the real user-facing footgun. Docs > code. If someone reports needing >31K rows/sec single-connection, that becomes Phase 22.
98 lines
4.3 KiB
Markdown
98 lines
4.3 KiB
Markdown
# Benchmarks (Phase 21)
|
||
|
||
Performance baselines for `informix-db`. Two layers:
|
||
|
||
1. **Codec micro-benchmarks** (`test_codec_perf.py`) — pure CPU, no
|
||
server. These set the *ceiling* for what end-to-end can achieve.
|
||
Run with `make bench-codec`. Suitable for CI's pre-merge job.
|
||
2. **End-to-end benchmarks** — exercise the full
|
||
PREPARE → BIND → EXECUTE → FETCH → CLOSE → RELEASE round-trip.
|
||
Need an Informix container (`make ifx-up`). Run with `make bench`.
|
||
|
||
## Headline numbers (baseline 2026-05-04, x86_64 Linux, dev container on loopback)
|
||
|
||
| Operation | Mean | Ops/sec |
|
||
|-|-:|-:|
|
||
| `decode(int)` (per cell) | 181 ns | 5.5M |
|
||
| `parse_tuple_payload(5 cols)` (per row) | 2.87 µs | 350K |
|
||
| `encode_param(int)` (per param) | 103 ns | 9.7M |
|
||
| `SELECT 1` round-trip | 177 µs | 5,650 |
|
||
| Pool acquire + tiny query + release | 295 µs | 3,400 |
|
||
| **Cold connect + close** (login handshake) | **11.2 ms** | **89** |
|
||
| 1000-row SELECT * | 1.56 ms | 640 |
|
||
| INSERT (single, prepared) | 1.88 ms | 530 |
|
||
| `executemany(100)` autocommit=True | 181 ms | ~550 rows/sec |
|
||
| `executemany(1000)` autocommit=True | 1.72 s | ~580 rows/sec |
|
||
| **`executemany(1000)` in single transaction** | **32 ms** | **~31,000 rows/sec** |
|
||
|
||
### What these tell you
|
||
|
||
- **Pool gives 72× speedup** over cold connect. If your app opens a
|
||
connection per request, fix that first.
|
||
- **Wrap bulk INSERTs in a transaction.** That's a **53× speedup** over
|
||
the autocommit-True default. With autocommit on, each row forces the
|
||
server to flush its transaction log; in transaction mode the flush
|
||
happens once at COMMIT. Per-row cost drops from 1.72 ms (storage-bound)
|
||
to 32 µs (pure protocol). PEP 249's default `autocommit=False` was
|
||
designed for this — we just default to `False`.
|
||
- **Codec is not the bottleneck.** Per-row decode (2.9 µs) is 1000× faster
|
||
than wire round-trip (177 µs for `SELECT 1`). Network and server-side
|
||
cost dominate.
|
||
- **UTF-8 carries no measurable cost.** `decode_varchar_utf8` runs at
|
||
216 ns vs `decode_varchar_short` at 170 ns — the 27% delta is the
|
||
multibyte string walk inherent in UTF-8 decoding, not Phase 20 overhead.
|
||
|
||
### Performance gotchas
|
||
|
||
- **`autocommit=True` + `executemany` is the slowest reasonable pattern.**
|
||
Use it only when each row genuinely needs to land independently. For
|
||
bulk loads, default `autocommit=False` and call `conn.commit()` at the
|
||
end of the batch.
|
||
- **Single `INSERT` in a tight loop is 1.88 ms each** — strictly worse
|
||
than `executemany` (which saves PREPARE/RELEASE overhead). If you find
|
||
yourself looping over `cur.execute("INSERT...")` hundreds of times,
|
||
switch to `executemany`.
|
||
- **Cold connect is 11 ms.** The login handshake is *expensive* compared
|
||
to anything you'll do with the connection. Pool everything in
|
||
long-lived processes.
|
||
|
||
## Regression policy
|
||
|
||
`baseline.json` is committed and represents the dev-container baseline.
|
||
Compare a current run against it with:
|
||
|
||
```bash
|
||
uv run pytest tests/benchmarks/ -m benchmark --benchmark-only \
|
||
--benchmark-compare=tests/benchmarks/baseline.json \
|
||
--benchmark-compare-fail=mean:25%
|
||
```
|
||
|
||
A 25% mean-regression fails the run. Adjust the threshold per CI noise
|
||
profile. CI's loopback-network-on-shared-runner is noisier than dev
|
||
container on a quiet box — start permissive and tighten as you collect
|
||
runs.
|
||
|
||
## Updating the baseline
|
||
|
||
When you intentionally change performance (an optimization, or accept
|
||
a regression for correctness), refresh:
|
||
|
||
```bash
|
||
make bench-save # writes .results/0001_run.json
|
||
cp tests/benchmarks/.results/Linux-CPython-*/0001_run.json tests/benchmarks/baseline.json
|
||
git add tests/benchmarks/baseline.json
|
||
```
|
||
|
||
Document the change in CHANGELOG so reviewers know why the floor moved.
|
||
|
||
## Files
|
||
|
||
- `test_codec_perf.py` — codec dispatch (decode, encode_param, parse_tuple_payload)
|
||
- `test_select_perf.py` — SELECT round-trips, single + multi-row
|
||
- `test_insert_perf.py` — INSERT single + executemany throughput
|
||
- `test_pool_perf.py` — cold connect vs pool acquire/release
|
||
- `test_async_perf.py` — async-path latency + concurrent throughput
|
||
- `conftest.py` — long-lived `bench_conn` and 1k-row `bench_table` fixtures
|
||
- `baseline.json` — committed baseline for regression comparison
|
||
- `.results/` — gitignored; per-run output from `make bench-save`
|