Phase 25: Branch reorder + invariant tripwires (2026.05.04.10)

Third-pass optimization on parse_tuple_payload's hot loop. Previous phases removed redundant work; this one removes correct-but-wasteful work: the if/elif chain checked branches in implementation order, not frequency order. Fixed-width types (INT, FLOAT, DATE, BIGINT - the most common columns in real queries) sat at the bottom, paying ~7 frozenset misses per column. Changes (src/informix_db/_resultset.py): * Added _FIXED_WIDTH_TYPES = frozenset(FIXED_WIDTHS.keys()) at module load. * New fast-path branch at the TOP of parse_tuple_payload's loop body that handles every _FIXED_WIDTH_TYPES column inline: one frozenset check, one dict lookup, one decode, continue. Skips every other branch. * Cleaned up the bottom fall-through; it now genuinely only catches unknown types. Performance vs Phase 24 baseline: * parse_tuple_5cols_iso8859: 1659 ns -> 1400 ns (-16%) * parse_tuple_5cols_utf8: 1649 ns -> 1341 ns (-19%) Cumulative vs Phase 21 baseline (before any optimization): * parse_tuple_5cols: 2796 ns -> 1400 ns (-50%) - HALF the time * decode_int: 230 ns -> 139 ns (-40%) Margaret Hamilton review surfaced one HIGH finding addressed before tagging: * H: The fast-path optimization assumes every FIXED_WIDTHS key is decodable WITHOUT qualifier inspection (encoded_length etc.). True today, but a future contributor adding a fixed-width type that needs qualifier bits (like DATETIME does) would silently get wrong decode behavior - Lauren-Bug class failure. Fix: added INVARIANT comment to FIXED_WIDTHS in converters.py AND added tests/test_resultset_invariants.py with three CI tripwire tests: - _FIXED_WIDTH_TYPES is disjoint from every other dispatch branch - Every FIXED_WIDTHS key has a DECODERS entry - DECODERS keys stay < 0x100 (Phase 24 collision-free guarantee) The tests carry instructions: if one fires, don't update the test to match - either restore the property or refactor the optimization. Comments rot when nobody reads them; tests fail loudly. baseline.json refreshed; 72 unit + 224 integration + 28 bench = 324 tests; ruff clean.
2026-05-04 23:34:05 -06:00 · 2026-05-04 23:34:05 -06:00 · e9aed6ce59
commit e9aed6ce59
parent dfa60ea501
7 changed files with 647 additions and 462 deletions
--- a/CHANGELOG.md
+++ b/CHANGELOG.md
@ -2,6 +2,55 @@

 All notable changes to `informix-db`. Versioning is [CalVer](https://calver.org/) — `YYYY.MM.DD` for date-based releases, `YYYY.MM.DD.N` for same-day post-releases per PEP 440.

+## 2026.05.04.10 — Branch reorder by frequency + invariant tripwires (Phase 25)
+
+Third-pass optimization on `parse_tuple_payload`. Previous phases removed redundant work; this one removes *correct-but-wasteful* work: the if/elif chain checked branches in implementation order, not frequency order. Fixed-width types (INT, FLOAT, DATE, BIGINT — by far the most common columns in real queries) sat at the *bottom* of the chain, paying ~7 frozenset/equality misses per column.
+
+### What changed
+
+- **Added `_FIXED_WIDTH_TYPES = frozenset(FIXED_WIDTHS.keys())`** at module load in `_resultset.py`.
+- **New fast-path branch at the TOP of `parse_tuple_payload`'s loop body** that handles every `_FIXED_WIDTH_TYPES` column inline (slice + `_decode_base` + advance). For an INT column we now hit one frozenset check, one dict lookup, one decode call — and skip every other branch.
+- **Cleaned up the bottom fall-through** since FIXED_WIDTHS-keyed types no longer reach it. The fall-through now genuinely only catches unknown/unhandled types; comment updated.
+
+### Margaret Hamilton review pass — invariant tripwires added
+
+The third Hamilton review of this hot path produced one HIGH-severity finding addressed before tagging. The pattern was the same as Phases 23 and 24: an optimization is correct *because of* a property of an external table (here: `FIXED_WIDTHS` keys are decodable without qualifier inspection), but the property is implicit. The finding's recommendation, going beyond a comment:
+
+- **Added `tests/test_resultset_invariants.py`** — three CI tripwire tests that turn the structural invariants from comments into executable checks:
+  1. `_FIXED_WIDTH_TYPES` is disjoint from every other dispatch branch's type set.
+  2. Every `FIXED_WIDTHS` key has a decoder in `DECODERS`.
+  3. All `DECODERS` keys are < 0x100 (the Phase 24 collision-free guarantee).
+- **Added INVARIANT comment to `FIXED_WIDTHS`** in `converters.py` explaining the qualifier-free constraint and pointing to the tripwire tests.
+
+The tests follow a simple discipline: if one fires, **don't update the test to match the new state** — read the docstring and either restore the property or refactor the optimization to no longer depend on it. Comments rot when nobody reads them; tests fail loudly when someone violates them.
+
+### Performance summary (Phase 25)
+
+| Benchmark | Phase 24 baseline | NOW | Δ |
+|---|---:|---:|---:|
+| `parse_tuple_5cols_iso8859` | 1659 ns | **1400 ns** | **-16%** |
+| `parse_tuple_5cols_utf8` | 1649 ns | **1341 ns** | **-19%** |
+
+End-to-end SELECT numbers fluctuate ±10% run-to-run on sub-millisecond loopback round-trips; the codec micro-benchmark is the durable measurement.
+
+### Cumulative improvement (vs. original Phase 21 baseline, before any optimization)
+
+| Metric | Original | NOW | Total Δ |
+|---|---:|---:|---:|
+| `parse_tuple_5cols` | 2796 ns | **1400 ns** | **-50%** |
+| `decode_int` | 230 ns | 139 ns | -40% |
+| `select_bench_table_all` (1k rows, where measurable) | 1477 µs | ~990 µs | ≈-33% |
+
+The per-row decode hot path is **half the time it took at start of optimization work**. Real-world fetch ceiling: 358K rows/sec → ~715K rows/sec on a single connection.
+
+### Tests
+
+3 new unit tests (the invariant tripwires). Total: **72 unit + 224 integration + 28 benchmark = 324 tests**.
+
+### Baseline refreshed
+
+`tests/benchmarks/baseline.json` updated. All tests pass; ruff clean.
+
 ## 2026.05.04.9 — Decoder dispatch + struct precompilation (Phase 24)

 Second pass of hot-path optimization. Phase 23 lifted IfxType conversions out of the loop body in `_resultset.py` (-26% on `parse_tuple_5cols`). Phase 24 goes deeper into the codec layer.
--- a/pyproject.toml
+++ b/pyproject.toml
@ -1,6 +1,6 @@
 [project]
 name = "informix-db"
-version = "2026.05.04.9"
+version = "2026.05.04.10"
 description = "Pure-Python driver for IBM Informix IDS — speaks the SQLI wire protocol over raw sockets. No CSDK, no JVM, no native libraries."
 readme = "README.md"
 license = { text = "MIT" }
--- a/src/informix_db/_resultset.py
+++ b/src/informix_db/_resultset.py
@ -223,6 +223,16 @@ _COMPOSITE_UDT_TYPES = frozenset({

 _NUMERIC_TYPES = frozenset({_TC_DECIMAL, _TC_MONEY})

+# Types that are fixed-width on the wire AND have a registered decoder
+# in ``FIXED_WIDTHS``: SMALLINT, INT, SERIAL, SMFLOAT, FLOAT, BIGINT,
+# BIGSERIAL, DATE, BOOL. These are the most common types in any real
+# query, so checking them FIRST in the parse_tuple_payload dispatch
+# saves ~7 frozenset/equality misses per column. Disjoint from every
+# other branch's type set (verified — none of these codes appear in
+# _LENGTH_PREFIXED_SHORT_TYPES, _NUMERIC_TYPES, _COMPOSITE_UDT_TYPES,
+# or as DATETIME/INTERVAL/UDTFIXED/UDTVAR/LVARCHAR).
+_FIXED_WIDTH_TYPES = frozenset(FIXED_WIDTHS.keys())
+

 def parse_tuple_payload(
    reader: IfxStreamReader,
@ -270,6 +280,18 @@ def parse_tuple_payload(
    for col in columns:
        tc = col.type_code

+        # Fast path: fixed-width types (INT, FLOAT, DATE, BIGINT, etc.)
+        # are by far the most common columns in real queries. Check them
+        # FIRST so we don't pay 7+ branch-misses per integer column.
+        # FIXED_WIDTHS keys are disjoint from every other branch's type
+        # set — see _FIXED_WIDTH_TYPES module-level comment.
+        if tc in _FIXED_WIDTH_TYPES:
+            width = FIXED_WIDTHS[tc]
+            raw = payload[offset:offset + width]
+            offset += width
+            values.append(_decode_base(tc, raw, encoding))
+            continue
+
        if tc in _LENGTH_PREFIXED_SHORT_TYPES:
            # In tuple data, VARCHAR/NCHAR/NVCHAR use a SINGLE-BYTE
            # length prefix (max 255 — IDS VARCHAR's hard limit), not
@ -414,12 +436,13 @@ def parse_tuple_payload(
            values.append(raw.decode(encoding))
            continue

-        # Fixed-width types
-        width = FIXED_WIDTHS.get(tc)
-        if width is None:
-            # Phase 6+ types (DATETIME, INTERVAL, BLOBs) — fall back
-            # to encoded_length and surface raw bytes.
-            width = col.encoded_length
+        # Unknown / unhandled type fall-through. The fast-path at the
+        # top of this loop already handled all FIXED_WIDTHS-registered
+        # types (INT, FLOAT, DATE, etc.); the explicit branches above
+        # handle every other known wire shape. Anything reaching here
+        # is a type code we don't recognize — surface ``encoded_length``
+        # bytes raw and let the decoder dispatch (or its fallback) react.
+        width = col.encoded_length
        raw = payload[offset:offset + width]
        offset += width
        try:
--- a/src/informix_db/converters.py
+++ b/src/informix_db/converters.py
@ -509,6 +509,21 @@ def _decode_decimal(raw: bytes) -> decimal.Decimal | None:
 # slice column values out of an SQ_TUPLE payload for fixed-width types.
 # Variable-width types (CHAR, VARCHAR, DECIMAL, etc.) are length-prefixed
 # on the wire and don't appear in this table.
+#
+# INVARIANT — every key here MUST be decodable by
+# ``_decode_base(tc, raw, encoding)`` with NO per-column qualifier
+# inspection (no ``col.encoded_length`` lookup, no extended_id check,
+# no extended_name check). This is **load-bearing for correctness**:
+# ``_resultset.parse_tuple_payload`` dispatches all ``FIXED_WIDTHS``
+# types through a single fast-path branch that does not pass
+# ``col.encoded_length`` to the decoder. If a new fixed-width type
+# needs qualifier bits (the way DATETIME and INTERVAL do — both
+# absent from this table for exactly that reason), give it its own
+# explicit branch in ``parse_tuple_payload`` instead of adding it
+# here. A test in ``tests/test_resultset_invariants.py`` enforces the
+# disjointness of this set against every other dispatch branch's
+# type set; another test enforces that every key here has a decoder
+# in DECODERS.
 FIXED_WIDTHS: dict[int, int] = {
    IfxType.SMALLINT: 2,
    IfxType.INT: 4,
--- a/tests/benchmarks/baseline.json
+++ b/tests/benchmarks/baseline.json
--- a/tests/test_resultset_invariants.py
+++ b/tests/test_resultset_invariants.py
@ -0,0 +1,98 @@
+"""Phase 25 — invariant tripwires for parse_tuple_payload's fast-path dispatch.
+
+These tests don't exercise behavior. They lock down the structural
+invariants the optimized hot loop in :func:`informix_db._resultset.parse_tuple_payload`
+relies on for correctness. Each test is a CI tripwire — if a future
+contributor breaks an invariant, these fail at test time rather than
+at a customer's wire-protocol mismatch six months later.
+
+Lessons from Margaret Hamilton's review of Phases 23/24/25:
+
+* The optimization is *correct* — but its correctness depends on
+  properties of unrelated tables (DECODERS keys, FIXED_WIDTHS keys,
+  IfxType flag bits) staying consistent.
+* A comment at the table only helps if the next contributor reads it.
+* A test fails loudly the moment the invariant is broken. Prefer that.
+
+If one of these tests fires, **do not** simply update the test to
+match the new state — that defeats the purpose. Instead read the
+docstring on the failed test and the corresponding INVARIANT comment
+in the source; either restore the property or refactor the
+optimization to no longer depend on it.
+"""
+
+from __future__ import annotations
+
+from informix_db._resultset import (
+    _COMPOSITE_UDT_TYPES,
+    _FIXED_WIDTH_TYPES,
+    _LENGTH_PREFIXED_SHORT_TYPES,
+    _NUMERIC_TYPES,
+    _TC_DATETIME,
+    _TC_INTERVAL,
+    _TC_LVARCHAR,
+    _TC_UDTFIXED,
+    _TC_UDTVAR,
+)
+from informix_db.converters import DECODERS, FIXED_WIDTHS
+
+
+def test_fixed_width_types_disjoint_from_other_dispatch_sets() -> None:
+    """parse_tuple_payload's fast path is silently wrong if the FIXED_WIDTHS
+    type set overlaps any other branch.
+
+    The optimization in ``parse_tuple_payload`` puts the FIXED_WIDTHS
+    branch FIRST. If a type is also in (e.g.) _NUMERIC_TYPES, the fast
+    path swallows it before the DECIMAL/MONEY-specific handler runs —
+    silently producing wrong values.
+
+    If this test fails, you've added a new entry somewhere that
+    overlaps. Either move it to FIXED_WIDTHS exclusively (and remove
+    its specialized branch) or remove it from FIXED_WIDTHS.
+    """
+    other_branch_types = (
+        _LENGTH_PREFIXED_SHORT_TYPES
+        | _NUMERIC_TYPES
+        | _COMPOSITE_UDT_TYPES
+        | {_TC_LVARCHAR, _TC_DATETIME, _TC_INTERVAL, _TC_UDTFIXED, _TC_UDTVAR}
+    )
+    overlap = _FIXED_WIDTH_TYPES & other_branch_types
+    assert overlap == set(), (
+        f"FIXED_WIDTHS overlap with another parse_tuple_payload branch: {overlap}. "
+        f"See the INVARIANT comment on FIXED_WIDTHS in converters.py."
+    )
+
+
+def test_every_fixed_width_type_has_a_decoder() -> None:
+    """The fast path calls ``_decode_base(tc, raw, encoding)`` for every
+    FIXED_WIDTHS key. If a key has no entry in DECODERS, we'd raise
+    ``NotImplementedError`` for that column — surprising the user.
+
+    If this test fails, you've added a key to FIXED_WIDTHS without
+    adding a corresponding decoder. Add the decoder, or remove the
+    key.
+    """
+    missing = [tc for tc in FIXED_WIDTHS if tc not in DECODERS]
+    assert missing == [], (
+        f"FIXED_WIDTHS has keys without DECODERS entries: {missing}. "
+        f"Every fixed-width type must be decodable by _decode_base."
+    )
+
+
+def test_decoders_keys_stay_below_0x100() -> None:
+    """The Phase 24 optimization in ``_decode_base`` skips ``base_type()``
+    by relying on a structural guarantee: all DECODERS keys are ≤ 0xFF
+    and all flag bits in _types.py are ≥ 0x100, so a flagged type code
+    cannot coincidentally match a DECODERS key.
+
+    If this test fails, you've added a decoder for a type code with
+    bits ≥ 0x100. The collision-free guarantee weakens — re-introduce
+    ``base_type()`` inside ``_decode_base`` (and remove the Phase 24
+    optimization), OR keep the new key but verify it cannot clash with
+    any flagged input.
+    """
+    high_keys = [tc for tc in DECODERS if tc >= 0x100]
+    assert high_keys == [], (
+        f"DECODERS contains keys with bits >= 0x100: {high_keys}. "
+        f"See the INVARIANT comment on DECODERS in converters.py."
+    )
--- a/uv.lock
+++ b/uv.lock
@ -34,7 +34,7 @@ wheels = [

 [[package]]
 name = "informix-db"
-version = "2026.5.4.8"
+version = "2026.5.4.9"
 source = { editable = "." }

 [package.optional-dependencies]