History

Ryan Malloy f3e589c5bf Phase 23: Hot-path optimization for parse_tuple_payload (2026.05.04.8)

Per-row decode is hit on every row of every SELECT. The original code
had three forms of waste in the inner loop:

1. Redundant base_type() call. ColumnInfo.type_code is already
   base-typed by parse_describe at construction; calling base_type()
   again per column per row was pure waste. Single largest savings.
2. IntFlag->int conversions inline (~10x per iteration). Lifted to
   module-level _TC_X constants.
3. Lazy imports inside the loop body (_decode_datetime, _decode_interval,
   BlobLocator, ClobLocator, RowValue, CollectionValue). Moved to top.

Plus three precomputed frozensets (_LENGTH_PREFIXED_SHORT_TYPES,
_COMPOSITE_UDT_TYPES, _NUMERIC_TYPES) replace inline tuple-membership
checks. _COLLECTION_KIND_MAP is now MappingProxyType (actually frozen).

Performance:
* parse_tuple_5cols: 2796 ns -> 2030 ns (-27%)
* select_bench_table_all (1k rows): 1477 us -> 1198 us (-19%)
* Codec micro-bench, cold connect, executemany: unchanged

Real-world fetch ceiling on a single connection: 350K rows/sec ->
490K rows/sec.

Margaret Hamilton review surfaced four cleanup items, all addressed
before tagging:
* H1: cursor._dereference_blob_columns had the same redundant
  base_type() call - stripped for consistency.
* M1: documented the load-bearing invariant at parse_describe (the
  single producer site) so future contributors have a grep target.
* M2: _COLLECTION_KIND_MAP wrapped in MappingProxyType.
* L1: stale line-number comment fixed to point at the INVARIANT
  comment instead.

baseline.json refreshed; all 224 integration tests pass; ruff clean.

2026-05-04 17:52:20 -06:00

__init__.py

Phase 21: Performance benchmarks (2026.05.04.5)

2026-05-04 17:21:12 -06:00

baseline.json

Phase 23: Hot-path optimization for parse_tuple_payload (2026.05.04.8)

2026-05-04 17:52:20 -06:00

conftest.py

Phase 21: Performance benchmarks (2026.05.04.5)

2026-05-04 17:21:12 -06:00

README.md

Phase 21.1: executemany perf - it was the autocommit cliff (2026.05.04.6)

2026-05-04 17:26:16 -06:00

test_async_perf.py

Phase 21: Performance benchmarks (2026.05.04.5)

2026-05-04 17:21:12 -06:00

test_codec_perf.py

Phase 21: Performance benchmarks (2026.05.04.5)

2026-05-04 17:21:12 -06:00

test_insert_perf.py

Phase 21.1: executemany perf - it was the autocommit cliff (2026.05.04.6)

2026-05-04 17:26:16 -06:00

test_pool_perf.py

Phase 21: Performance benchmarks (2026.05.04.5)

2026-05-04 17:21:12 -06:00

test_select_perf.py

Phase 21: Performance benchmarks (2026.05.04.5)

2026-05-04 17:21:12 -06:00

README.md

Benchmarks (Phase 21)

Performance baselines for informix-db. Two layers:

Codec micro-benchmarks (test_codec_perf.py) — pure CPU, no server. These set the ceiling for what end-to-end can achieve. Run with make bench-codec. Suitable for CI's pre-merge job.
End-to-end benchmarks — exercise the full PREPARE → BIND → EXECUTE → FETCH → CLOSE → RELEASE round-trip. Need an Informix container (make ifx-up). Run with make bench.

Headline numbers (baseline 2026-05-04, x86_64 Linux, dev container on loopback)

Operation	Mean	Ops/sec
`decode(int)` (per cell)	181 ns	5.5M
`parse_tuple_payload(5 cols)` (per row)	2.87 µs	350K
`encode_param(int)` (per param)	103 ns	9.7M
`SELECT 1` round-trip	177 µs	5,650
Pool acquire + tiny query + release	295 µs	3,400
Cold connect + close (login handshake)	11.2 ms	89
1000-row SELECT *	1.56 ms	640
INSERT (single, prepared)	1.88 ms	530
`executemany(100)` autocommit=True	181 ms	~550 rows/sec
`executemany(1000)` autocommit=True	1.72 s	~580 rows/sec
`executemany(1000)` in single transaction	32 ms	~31,000 rows/sec

What these tell you

Pool gives 72× speedup over cold connect. If your app opens a connection per request, fix that first.
Wrap bulk INSERTs in a transaction. That's a 53× speedup over the autocommit-True default. With autocommit on, each row forces the server to flush its transaction log; in transaction mode the flush happens once at COMMIT. Per-row cost drops from 1.72 ms (storage-bound) to 32 µs (pure protocol). PEP 249's default autocommit=False was designed for this — we just default to False.
Codec is not the bottleneck. Per-row decode (2.9 µs) is 1000× faster than wire round-trip (177 µs for SELECT 1). Network and server-side cost dominate.
UTF-8 carries no measurable cost. decode_varchar_utf8 runs at 216 ns vs decode_varchar_short at 170 ns — the 27% delta is the multibyte string walk inherent in UTF-8 decoding, not Phase 20 overhead.

Performance gotchas

autocommit=True + executemany is the slowest reasonable pattern. Use it only when each row genuinely needs to land independently. For bulk loads, default autocommit=False and call conn.commit() at the end of the batch.
Single INSERT in a tight loop is 1.88 ms each — strictly worse than executemany (which saves PREPARE/RELEASE overhead). If you find yourself looping over cur.execute("INSERT...") hundreds of times, switch to executemany.
Cold connect is 11 ms. The login handshake is expensive compared to anything you'll do with the connection. Pool everything in long-lived processes.

Regression policy

baseline.json is committed and represents the dev-container baseline. Compare a current run against it with:

uv run pytest tests/benchmarks/ -m benchmark --benchmark-only \
    --benchmark-compare=tests/benchmarks/baseline.json \
    --benchmark-compare-fail=mean:25%

A 25% mean-regression fails the run. Adjust the threshold per CI noise profile. CI's loopback-network-on-shared-runner is noisier than dev container on a quiet box — start permissive and tighten as you collect runs.

Updating the baseline

When you intentionally change performance (an optimization, or accept a regression for correctness), refresh:

make bench-save                                 # writes .results/0001_run.json
cp tests/benchmarks/.results/Linux-CPython-*/0001_run.json tests/benchmarks/baseline.json
git add tests/benchmarks/baseline.json

Document the change in CHANGELOG so reviewers know why the floor moved.

Files

test_codec_perf.py — codec dispatch (decode, encode_param, parse_tuple_payload)
test_select_perf.py — SELECT round-trips, single + multi-row
test_insert_perf.py — INSERT single + executemany throughput
test_pool_perf.py — cold connect vs pool acquire/release
test_async_perf.py — async-path latency + concurrent throughput
conftest.py — long-lived bench_conn and 1k-row bench_table fixtures
baseline.json — committed baseline for regression comparison
.results/ — gitignored; per-run output from make bench-save

README.md Unescape Escape

Benchmarks (Phase 21)

Headline numbers (baseline 2026-05-04, x86_64 Linux, dev container on loopback)

What these tell you

Performance gotchas

Regression policy

Updating the baseline

Files

README.md