Third-pass optimization on parse_tuple_payload's hot loop. Previous
phases removed redundant work; this one removes correct-but-wasteful
work: the if/elif chain checked branches in implementation order, not
frequency order. Fixed-width types (INT, FLOAT, DATE, BIGINT - the most
common columns in real queries) sat at the bottom, paying ~7 frozenset
misses per column.
Changes (src/informix_db/_resultset.py):
* Added _FIXED_WIDTH_TYPES = frozenset(FIXED_WIDTHS.keys()) at module
load.
* New fast-path branch at the TOP of parse_tuple_payload's loop body
that handles every _FIXED_WIDTH_TYPES column inline: one frozenset
check, one dict lookup, one decode, continue. Skips every other
branch.
* Cleaned up the bottom fall-through; it now genuinely only catches
unknown types.
Performance vs Phase 24 baseline:
* parse_tuple_5cols_iso8859: 1659 ns -> 1400 ns (-16%)
* parse_tuple_5cols_utf8: 1649 ns -> 1341 ns (-19%)
Cumulative vs Phase 21 baseline (before any optimization):
* parse_tuple_5cols: 2796 ns -> 1400 ns (-50%) - HALF the time
* decode_int: 230 ns -> 139 ns (-40%)
Margaret Hamilton review surfaced one HIGH finding addressed before
tagging:
* H: The fast-path optimization assumes every FIXED_WIDTHS key is
decodable WITHOUT qualifier inspection (encoded_length etc.). True
today, but a future contributor adding a fixed-width type that
needs qualifier bits (like DATETIME does) would silently get wrong
decode behavior - Lauren-Bug class failure.
Fix: added INVARIANT comment to FIXED_WIDTHS in converters.py AND
added tests/test_resultset_invariants.py with three CI tripwire
tests:
- _FIXED_WIDTH_TYPES is disjoint from every other dispatch branch
- Every FIXED_WIDTHS key has a DECODERS entry
- DECODERS keys stay < 0x100 (Phase 24 collision-free guarantee)
The tests carry instructions: if one fires, don't update the test
to match - either restore the property or refactor the optimization.
Comments rot when nobody reads them; tests fail loudly.
baseline.json refreshed; 72 unit + 224 integration + 28 bench = 324
tests; ruff clean.
Second pass of hot-path optimization on parse_tuple_payload. Two changes
to converters.py:
1. Split decode() into public + internal. Added _decode_base(base_tc,
raw, encoding) that takes an already-base-typed code and skips the
redundant base_type() call. Public decode() is now a one-line
wrapper. parse_tuple_payload's 4 call sites swapped to use
_decode_base directly. _fastpath.py's external decode() caller is
unaffected.
2. Pre-compiled struct.Struct unpackers. The fixed-width integer/float
decoders (_decode_smallint, _decode_int, _decode_bigint,
_decode_smfloat, _decode_float, _decode_date) switched from per-call
struct.unpack(fmt, raw) to module-level bound methods like
_UNPACK_INT = struct.Struct("!i").unpack. Format-string parsed once
at module load. Measured 37% faster than per-call struct.unpack on
CPython 3.13 micro.
Performance vs Phase 23 baseline:
* decode_int: 173 ns -> 139 ns (-20%)
* decode_bigint: 188 ns -> 150 ns (-20%)
* parse_tuple_5cols: 2047 ns -> 1592 ns (-22%)
* 1k-row SELECT: 1255 us -> 989 us (-21%)
Cumulative vs original Phase 21 baseline:
* decode_int: 230 ns -> 139 ns (-40%)
* parse_tuple_5cols: 2796 ns -> 1592 ns (-43%)
* 1k-row SELECT: 1477 us -> 989 us (-33%)
Real-world fetch ceiling: 358K rows/sec -> ~620K rows/sec.
Margaret Hamilton review surfaced one HIGH-severity finding addressed
before tagging:
* H: The no-collision guarantee that makes _decode_base safe is
structural but undocumented (all DECODERS keys are ≤ 0xFF, all flag
bits are ≥ 0x100, so flagged inputs cannot coincidentally match).
Added load-bearing INVARIANT comment at DECODERS dict explaining
the constraint and what to do if violated. Cross-referenced from
_decode_base's docstring for bidirectional traceability.
baseline.json refreshed; all 224 integration tests pass; ruff clean.
Per-row decode is hit on every row of every SELECT. The original code
had three forms of waste in the inner loop:
1. Redundant base_type() call. ColumnInfo.type_code is already
base-typed by parse_describe at construction; calling base_type()
again per column per row was pure waste. Single largest savings.
2. IntFlag->int conversions inline (~10x per iteration). Lifted to
module-level _TC_X constants.
3. Lazy imports inside the loop body (_decode_datetime, _decode_interval,
BlobLocator, ClobLocator, RowValue, CollectionValue). Moved to top.
Plus three precomputed frozensets (_LENGTH_PREFIXED_SHORT_TYPES,
_COMPOSITE_UDT_TYPES, _NUMERIC_TYPES) replace inline tuple-membership
checks. _COLLECTION_KIND_MAP is now MappingProxyType (actually frozen).
Performance:
* parse_tuple_5cols: 2796 ns -> 2030 ns (-27%)
* select_bench_table_all (1k rows): 1477 us -> 1198 us (-19%)
* Codec micro-bench, cold connect, executemany: unchanged
Real-world fetch ceiling on a single connection: 350K rows/sec ->
490K rows/sec.
Margaret Hamilton review surfaced four cleanup items, all addressed
before tagging:
* H1: cursor._dereference_blob_columns had the same redundant
base_type() call - stripped for consistency.
* M1: documented the load-bearing invariant at parse_describe (the
single producer site) so future contributors have a grep target.
* M2: _COLLECTION_KIND_MAP wrapped in MappingProxyType.
* L1: stale line-number comment fixed to point at the INVARIANT
comment instead.
baseline.json refreshed; all 224 integration tests pass; ruff clean.
Investigation of the Phase 21 baseline finding that executemany(N) cost
scaled linearly per-row (1.74 ms x N) regardless of batch size.
Root cause: every autocommit=True INSERT forces a server-side
transaction-log flush. Not a wire-protocol bug.
Numbers:
* executemany(1000) autocommit=True: 1.72 s (1.72 ms/row)
* executemany(1000) in single txn: 32 ms (32 us/row)
53x speedup from changing the transaction boundary, not the driver.
Pure protocol overhead is ~32 us/row -> ~31K rows/sec sustained
throughput on a single connection. Comparable to pg8000.
Added test_executemany_1000_rows_in_txn benchmark to make this
visible. Updated README headline numbers and added a "Performance
gotchas" section explaining when autocommit=False matters.
Decision: don't pipeline. The remaining 32 us is already excellent;
the autocommit gotcha is the real user-facing footgun. Docs > code.
If someone reports needing >31K rows/sec single-connection, that
becomes Phase 22.