informix-db

warehack.ing/informix-db

Fork 0

Commit Graph

Author	SHA1	Message	Date
Ryan Malloy	f3e589c5bf	Phase 23: Hot-path optimization for parse_tuple_payload (2026.05.04.8) Per-row decode is hit on every row of every SELECT. The original code had three forms of waste in the inner loop: 1. Redundant base_type() call. ColumnInfo.type_code is already base-typed by parse_describe at construction; calling base_type() again per column per row was pure waste. Single largest savings. 2. IntFlag->int conversions inline (~10x per iteration). Lifted to module-level _TC_X constants. 3. Lazy imports inside the loop body (_decode_datetime, _decode_interval, BlobLocator, ClobLocator, RowValue, CollectionValue). Moved to top. Plus three precomputed frozensets (_LENGTH_PREFIXED_SHORT_TYPES, _COMPOSITE_UDT_TYPES, _NUMERIC_TYPES) replace inline tuple-membership checks. _COLLECTION_KIND_MAP is now MappingProxyType (actually frozen). Performance: * parse_tuple_5cols: 2796 ns -> 2030 ns (-27%) * select_bench_table_all (1k rows): 1477 us -> 1198 us (-19%) * Codec micro-bench, cold connect, executemany: unchanged Real-world fetch ceiling on a single connection: 350K rows/sec -> 490K rows/sec. Margaret Hamilton review surfaced four cleanup items, all addressed before tagging: * H1: cursor._dereference_blob_columns had the same redundant base_type() call - stripped for consistency. * M1: documented the load-bearing invariant at parse_describe (the single producer site) so future contributors have a grep target. * M2: _COLLECTION_KIND_MAP wrapped in MappingProxyType. * L1: stale line-number comment fixed to point at the INVARIANT comment instead. baseline.json refreshed; all 224 integration tests pass; ruff clean.	2026-05-04 17:52:20 -06:00
Ryan Malloy	495128c679	Phase 21.1: executemany perf - it was the autocommit cliff (2026.05.04.6) Investigation of the Phase 21 baseline finding that executemany(N) cost scaled linearly per-row (1.74 ms x N) regardless of batch size. Root cause: every autocommit=True INSERT forces a server-side transaction-log flush. Not a wire-protocol bug. Numbers: * executemany(1000) autocommit=True: 1.72 s (1.72 ms/row) * executemany(1000) in single txn: 32 ms (32 us/row) 53x speedup from changing the transaction boundary, not the driver. Pure protocol overhead is ~32 us/row -> ~31K rows/sec sustained throughput on a single connection. Comparable to pg8000. Added test_executemany_1000_rows_in_txn benchmark to make this visible. Updated README headline numbers and added a "Performance gotchas" section explaining when autocommit=False matters. Decision: don't pipeline. The remaining 32 us is already excellent; the autocommit gotcha is the real user-facing footgun. Docs > code. If someone reports needing >31K rows/sec single-connection, that becomes Phase 22.	2026-05-04 17:26:16 -06:00
Ryan Malloy	90ce035a00	Phase 21: Performance benchmarks (2026.05.04.5) Adds tests/benchmarks/ with pytest-benchmark coverage of the hot codec paths and end-to-end SELECT/INSERT/pool/async round-trips. Establishes a committed baseline.json so PRs can be regression-checked at review via --benchmark-compare. * test_codec_perf.py (16): decode/encode_param/parse_tuple_payload micro-benchmarks - run without container, suitable for pre-merge CI. * test_select_perf.py (4): SELECT round-trips - 1-row latency floor, 10-row, 1k-row full fetch, parameterized. * test_insert_perf.py (3): single-row INSERT, executemany 100 / 1000. * test_pool_perf.py (3): cold connect, pool acquire/release, pool acquire + query + release. * test_async_perf.py (2): async round-trip overhead, 10x concurrent. * baseline.json: committed snapshot, 28 measurements. * benchmark pytest marker, gated off by default. * Makefile: bench / bench-codec / bench-save targets; test-integration excludes benchmarks for speed. Headline numbers (dev container loopback): * decode(int): 181 ns * parse_tuple 5 cols: 2.87 µs/row * SELECT 1 round-trip: 177 µs * Pool acquire+query+release: 295 µs * Cold connect: 11.2 ms (72x slower than pool) UTF-8 decode carries no measurable cost vs iso-8859-1 - confirms Phase 20 didn't regress anything. Total: 69 unit + 211 integration + 28 benchmark = 308 tests.	2026-05-04 17:21:12 -06:00

Author

SHA1

Message

Date

Ryan Malloy

f3e589c5bf

Phase 23: Hot-path optimization for parse_tuple_payload (2026.05.04.8)

Per-row decode is hit on every row of every SELECT. The original code
had three forms of waste in the inner loop:

1. Redundant base_type() call. ColumnInfo.type_code is already
   base-typed by parse_describe at construction; calling base_type()
   again per column per row was pure waste. Single largest savings.
2. IntFlag->int conversions inline (~10x per iteration). Lifted to
   module-level _TC_X constants.
3. Lazy imports inside the loop body (_decode_datetime, _decode_interval,
   BlobLocator, ClobLocator, RowValue, CollectionValue). Moved to top.

Plus three precomputed frozensets (_LENGTH_PREFIXED_SHORT_TYPES,
_COMPOSITE_UDT_TYPES, _NUMERIC_TYPES) replace inline tuple-membership
checks. _COLLECTION_KIND_MAP is now MappingProxyType (actually frozen).

Performance:
* parse_tuple_5cols: 2796 ns -> 2030 ns (-27%)
* select_bench_table_all (1k rows): 1477 us -> 1198 us (-19%)
* Codec micro-bench, cold connect, executemany: unchanged

Real-world fetch ceiling on a single connection: 350K rows/sec ->
490K rows/sec.

Margaret Hamilton review surfaced four cleanup items, all addressed
before tagging:
* H1: cursor._dereference_blob_columns had the same redundant
  base_type() call - stripped for consistency.
* M1: documented the load-bearing invariant at parse_describe (the
  single producer site) so future contributors have a grep target.
* M2: _COLLECTION_KIND_MAP wrapped in MappingProxyType.
* L1: stale line-number comment fixed to point at the INVARIANT
  comment instead.

baseline.json refreshed; all 224 integration tests pass; ruff clean.

2026-05-04 17:52:20 -06:00

Ryan Malloy

495128c679

Phase 21.1: executemany perf - it was the autocommit cliff (2026.05.04.6)

Investigation of the Phase 21 baseline finding that executemany(N) cost
scaled linearly per-row (1.74 ms x N) regardless of batch size.

Root cause: every autocommit=True INSERT forces a server-side
transaction-log flush. Not a wire-protocol bug.

Numbers:
* executemany(1000) autocommit=True: 1.72 s (1.72 ms/row)
* executemany(1000) in single txn:    32 ms (32 us/row)

53x speedup from changing the transaction boundary, not the driver.
Pure protocol overhead is ~32 us/row -> ~31K rows/sec sustained
throughput on a single connection. Comparable to pg8000.

Added test_executemany_1000_rows_in_txn benchmark to make this
visible. Updated README headline numbers and added a "Performance
gotchas" section explaining when autocommit=False matters.

Decision: don't pipeline. The remaining 32 us is already excellent;
the autocommit gotcha is the real user-facing footgun. Docs > code.
If someone reports needing >31K rows/sec single-connection, that
becomes Phase 22.

2026-05-04 17:26:16 -06:00

Ryan Malloy

90ce035a00

Phase 21: Performance benchmarks (2026.05.04.5)

Adds tests/benchmarks/ with pytest-benchmark coverage of the hot codec
paths and end-to-end SELECT/INSERT/pool/async round-trips. Establishes
a committed baseline.json so PRs can be regression-checked at review
via --benchmark-compare.

* test_codec_perf.py (16): decode/encode_param/parse_tuple_payload
  micro-benchmarks - run without container, suitable for pre-merge CI.
* test_select_perf.py (4): SELECT round-trips - 1-row latency floor,
  10-row, 1k-row full fetch, parameterized.
* test_insert_perf.py (3): single-row INSERT, executemany 100 / 1000.
* test_pool_perf.py (3): cold connect, pool acquire/release, pool
  acquire + query + release.
* test_async_perf.py (2): async round-trip overhead, 10x concurrent.
* baseline.json: committed snapshot, 28 measurements.
* benchmark pytest marker, gated off by default.
* Makefile: bench / bench-codec / bench-save targets;
  test-integration excludes benchmarks for speed.

Headline numbers (dev container loopback):
* decode(int): 181 ns
* parse_tuple 5 cols: 2.87 µs/row
* SELECT 1 round-trip: 177 µs
* Pool acquire+query+release: 295 µs
* Cold connect: 11.2 ms (72x slower than pool)

UTF-8 decode carries no measurable cost vs iso-8859-1 - confirms
Phase 20 didn't regress anything.

Total: 69 unit + 211 integration + 28 benchmark = 308 tests.

2026-05-04 17:21:12 -06:00

3 Commits