History

Ryan Malloy 362ecb3d63 Phase 33: Pipelined executemany - 2.85x faster bulk insert (2026.05.05.6)

The serial-loop executemany paid one wire round-trip per row (~30us/
row on loopback). It was the one benchmark where IfxPy beat us in
the comparison work - 10% slower at executemany(1000) in txn.

Phase 33 pipelines the BIND+EXECUTE PDUs: build all N PDUs, send
them back-to-back, then drain all N responses. Eliminates per-row
RTT entirely.

Performance impact:
* executemany(1000) in txn:   31.3 ms -> 11.0 ms (2.85x faster)
* executemany(100) autocommit: 173 ms -> 154 ms (11% faster)
* executemany(1000) autocommit: 1740 ms -> 1590 ms (9% faster)

(Autocommit gets smaller wins because server-side log flushes
dominate - Phase 21.1's "autocommit cliff".)

IfxPy comparison flipped: us 10% slower -> us 2.05x faster on bulk
inserts. We now win all 5 head-to-head benchmarks against the C-bound
driver.

Margaret Hamilton review surfaced one CRITICAL concern (C1) - the
pipeline assumes Informix sends N responses for N pipelined PDUs
even when one fails. If the server cut the stream short, the drain
loop would deadlock on the next read.

Verified by 3 new integration tests in tests/test_executemany_pipeline.py:
* test_pipelined_executemany_mid_batch_constraint_violation (row 500/1000)
* test_pipelined_executemany_first_row_fails (row 0/100)
* test_pipelined_executemany_last_row_fails (row 99/100)

All confirm Informix sends N responses; wire stays aligned; connection
is usable after.

Plus 4 lower-priority fixes Hamilton recommended:
* H1: documented _raise_sq_err self-drains-SQ_EOT invariant + tripwire
* H2: docstring warning about O(N) lock duration; chunk for huge batches
* M1: prepend row-index to exception message rather than reformat
* M2: documented sendall-no-timeout caveat on hostile networks

77 unit + 239 integration + 33 benchmark = 349 tests; ruff clean.

Note: Phase 32 (Tier 1+2 benchmarks) was tagged without bumping
pyproject.toml's version string. .5 was git-tag-only; .6 is the next
published version increment.

2026-05-05 12:26:15 -06:00

compare

Phase 33: Pipelined executemany - 2.85x faster bulk insert (2026.05.05.6)

2026-05-05 12:26:15 -06:00

__init__.py

Phase 21: Performance benchmarks (2026.05.04.5)

2026-05-04 17:21:12 -06:00

baseline.json

Phase 25: Branch reorder + invariant tripwires (2026.05.04.10)

2026-05-04 23:34:05 -06:00

conftest.py

Phase 21: Performance benchmarks (2026.05.04.5)

2026-05-04 17:21:12 -06:00

README.md

Phase 21.1: executemany perf - it was the autocommit cliff (2026.05.04.6)

2026-05-04 17:26:16 -06:00

test_async_perf.py

Phase 21: Performance benchmarks (2026.05.04.5)

2026-05-04 17:21:12 -06:00

test_codec_perf.py

Phase 21: Performance benchmarks (2026.05.04.5)

2026-05-04 17:21:12 -06:00

test_insert_perf.py

Phase 32: Benchmark improvements (Tier 1 + Tier 2)

2026-05-05 12:01:11 -06:00

test_observability_perf.py

Phase 32: Benchmark improvements (Tier 1 + Tier 2)

2026-05-05 12:01:11 -06:00

test_pool_perf.py

Phase 32: Benchmark improvements (Tier 1 + Tier 2)

2026-05-05 12:01:11 -06:00

test_select_perf.py

Phase 21: Performance benchmarks (2026.05.04.5)

2026-05-04 17:21:12 -06:00

README.md

Benchmarks (Phase 21)

Performance baselines for informix-db. Two layers:

Codec micro-benchmarks (test_codec_perf.py) — pure CPU, no server. These set the ceiling for what end-to-end can achieve. Run with make bench-codec. Suitable for CI's pre-merge job.
End-to-end benchmarks — exercise the full PREPARE → BIND → EXECUTE → FETCH → CLOSE → RELEASE round-trip. Need an Informix container (make ifx-up). Run with make bench.

Headline numbers (baseline 2026-05-04, x86_64 Linux, dev container on loopback)

Operation	Mean	Ops/sec
`decode(int)` (per cell)	181 ns	5.5M
`parse_tuple_payload(5 cols)` (per row)	2.87 µs	350K
`encode_param(int)` (per param)	103 ns	9.7M
`SELECT 1` round-trip	177 µs	5,650
Pool acquire + tiny query + release	295 µs	3,400
Cold connect + close (login handshake)	11.2 ms	89
1000-row SELECT *	1.56 ms	640
INSERT (single, prepared)	1.88 ms	530
`executemany(100)` autocommit=True	181 ms	~550 rows/sec
`executemany(1000)` autocommit=True	1.72 s	~580 rows/sec
`executemany(1000)` in single transaction	32 ms	~31,000 rows/sec

What these tell you

Pool gives 72× speedup over cold connect. If your app opens a connection per request, fix that first.
Wrap bulk INSERTs in a transaction. That's a 53× speedup over the autocommit-True default. With autocommit on, each row forces the server to flush its transaction log; in transaction mode the flush happens once at COMMIT. Per-row cost drops from 1.72 ms (storage-bound) to 32 µs (pure protocol). PEP 249's default autocommit=False was designed for this — we just default to False.
Codec is not the bottleneck. Per-row decode (2.9 µs) is 1000× faster than wire round-trip (177 µs for SELECT 1). Network and server-side cost dominate.
UTF-8 carries no measurable cost. decode_varchar_utf8 runs at 216 ns vs decode_varchar_short at 170 ns — the 27% delta is the multibyte string walk inherent in UTF-8 decoding, not Phase 20 overhead.

Performance gotchas

autocommit=True + executemany is the slowest reasonable pattern. Use it only when each row genuinely needs to land independently. For bulk loads, default autocommit=False and call conn.commit() at the end of the batch.
Single INSERT in a tight loop is 1.88 ms each — strictly worse than executemany (which saves PREPARE/RELEASE overhead). If you find yourself looping over cur.execute("INSERT...") hundreds of times, switch to executemany.
Cold connect is 11 ms. The login handshake is expensive compared to anything you'll do with the connection. Pool everything in long-lived processes.

Regression policy

baseline.json is committed and represents the dev-container baseline. Compare a current run against it with:

uv run pytest tests/benchmarks/ -m benchmark --benchmark-only \
    --benchmark-compare=tests/benchmarks/baseline.json \
    --benchmark-compare-fail=mean:25%

A 25% mean-regression fails the run. Adjust the threshold per CI noise profile. CI's loopback-network-on-shared-runner is noisier than dev container on a quiet box — start permissive and tighten as you collect runs.

Updating the baseline

When you intentionally change performance (an optimization, or accept a regression for correctness), refresh:

make bench-save                                 # writes .results/0001_run.json
cp tests/benchmarks/.results/Linux-CPython-*/0001_run.json tests/benchmarks/baseline.json
git add tests/benchmarks/baseline.json

Document the change in CHANGELOG so reviewers know why the floor moved.

Files

test_codec_perf.py — codec dispatch (decode, encode_param, parse_tuple_payload)
test_select_perf.py — SELECT round-trips, single + multi-row
test_insert_perf.py — INSERT single + executemany throughput
test_pool_perf.py — cold connect vs pool acquire/release
test_async_perf.py — async-path latency + concurrent throughput
conftest.py — long-lived bench_conn and 1k-row bench_table fixtures
baseline.json — committed baseline for regression comparison
.results/ — gitignored; per-run output from make bench-save

README.md Unescape Escape

Benchmarks (Phase 21)

Headline numbers (baseline 2026-05-04, x86_64 Linux, dev container on loopback)

What these tell you

Performance gotchas

Regression policy

Updating the baseline

Files

README.md