Phase 25: Branch reorder + invariant tripwires (2026.05.04.10)

Third-pass optimization on parse_tuple_payload's hot loop. Previous phases removed redundant work; this one removes correct-but-wasteful work: the if/elif chain checked branches in implementation order, not frequency order. Fixed-width types (INT, FLOAT, DATE, BIGINT - the most common columns in real queries) sat at the bottom, paying ~7 frozenset misses per column. Changes (src/informix_db/_resultset.py): * Added _FIXED_WIDTH_TYPES = frozenset(FIXED_WIDTHS.keys()) at module load. * New fast-path branch at the TOP of parse_tuple_payload's loop body that handles every _FIXED_WIDTH_TYPES column inline: one frozenset check, one dict lookup, one decode, continue. Skips every other branch. * Cleaned up the bottom fall-through; it now genuinely only catches unknown types. Performance vs Phase 24 baseline: * parse_tuple_5cols_iso8859: 1659 ns -> 1400 ns (-16%) * parse_tuple_5cols_utf8: 1649 ns -> 1341 ns (-19%) Cumulative vs Phase 21 baseline (before any optimization): * parse_tuple_5cols: 2796 ns -> 1400 ns (-50%) - HALF the time * decode_int: 230 ns -> 139 ns (-40%) Margaret Hamilton review surfaced one HIGH finding addressed before tagging: * H: The fast-path optimization assumes every FIXED_WIDTHS key is decodable WITHOUT qualifier inspection (encoded_length etc.). True today, but a future contributor adding a fixed-width type that needs qualifier bits (like DATETIME does) would silently get wrong decode behavior - Lauren-Bug class failure. Fix: added INVARIANT comment to FIXED_WIDTHS in converters.py AND added tests/test_resultset_invariants.py with three CI tripwire tests: - _FIXED_WIDTH_TYPES is disjoint from every other dispatch branch - Every FIXED_WIDTHS key has a DECODERS entry - DECODERS keys stay < 0x100 (Phase 24 collision-free guarantee) The tests carry instructions: if one fires, don't update the test to match - either restore the property or refactor the optimization. Comments rot when nobody reads them; tests fail loudly. baseline.json refreshed; 72 unit + 224 integration + 28 bench = 324 tests; ruff clean.
2026-05-04 23:34:05 -06:00 · 2026-05-04 23:34:05 -06:00 · e9aed6ce59
commit e9aed6ce59
parent dfa60ea501
7 changed files with 647 additions and 462 deletions
--- a/CHANGELOG.md
+++ b/CHANGELOG.md
@ -2,6 +2,55 @@
 All notable changes to `informix-db`. Versioning is [CalVer](https://calver.org/) — `YYYY.MM.DD` for date-based releases, `YYYY.MM.DD.N` for same-day post-releases per PEP 440.
 ## 2026.05.04.10 — Branch reorder by frequency + invariant tripwires (Phase 25)
 Third-pass optimization on `parse_tuple_payload`. Previous phases removed redundant work; this one removes *correct-but-wasteful* work: the if/elif chain checked branches in implementation order, not frequency order. Fixed-width types (INT, FLOAT, DATE, BIGINT — by far the most common columns in real queries) sat at the *bottom* of the chain, paying ~7 frozenset/equality misses per column.
 ### What changed
 - **Added `_FIXED_WIDTH_TYPES = frozenset(FIXED_WIDTHS.keys())`** at module load in `_resultset.py`.
 - **New fast-path branch at the TOP of `parse_tuple_payload`'s loop body** that handles every `_FIXED_WIDTH_TYPES` column inline (slice + `_decode_base` + advance). For an INT column we now hit one frozenset check, one dict lookup, one decode call — and skip every other branch.
 - **Cleaned up the bottom fall-through** since FIXED_WIDTHS-keyed types no longer reach it. The fall-through now genuinely only catches unknown/unhandled types; comment updated.
 ### Margaret Hamilton review pass — invariant tripwires added
 The third Hamilton review of this hot path produced one HIGH-severity finding addressed before tagging. The pattern was the same as Phases 23 and 24: an optimization is correct *because of* a property of an external table (here: `FIXED_WIDTHS` keys are decodable without qualifier inspection), but the property is implicit. The finding's recommendation, going beyond a comment:
 - **Added `tests/test_resultset_invariants.py`** — three CI tripwire tests that turn the structural invariants from comments into executable checks:
  1. `_FIXED_WIDTH_TYPES` is disjoint from every other dispatch branch's type set.
  2. Every `FIXED_WIDTHS` key has a decoder in `DECODERS`.
  3. All `DECODERS` keys are < 0x100 (the Phase 24 collision-free guarantee).
 - **Added INVARIANT comment to `FIXED_WIDTHS`** in `converters.py` explaining the qualifier-free constraint and pointing to the tripwire tests.
 The tests follow a simple discipline: if one fires, **don't update the test to match the new state** — read the docstring and either restore the property or refactor the optimization to no longer depend on it. Comments rot when nobody reads them; tests fail loudly when someone violates them.
 ### Performance summary (Phase 25)
 | Benchmark | Phase 24 baseline | NOW | Δ |
 |---|---:|---:|---:|
 | `parse_tuple_5cols_iso8859` | 1659 ns | **1400 ns** | **-16%** |
 | `parse_tuple_5cols_utf8` | 1649 ns | **1341 ns** | **-19%** |
 End-to-end SELECT numbers fluctuate ±10% run-to-run on sub-millisecond loopback round-trips; the codec micro-benchmark is the durable measurement.
 ### Cumulative improvement (vs. original Phase 21 baseline, before any optimization)
 | Metric | Original | NOW | Total Δ |
 |---|---:|---:|---:|
 | `parse_tuple_5cols` | 2796 ns | **1400 ns** | **-50%** |
 | `decode_int` | 230 ns | 139 ns | -40% |
 | `select_bench_table_all` (1k rows, where measurable) | 1477 µs | ~990 µs | ≈-33% |
 The per-row decode hot path is **half the time it took at start of optimization work**. Real-world fetch ceiling: 358K rows/sec → ~715K rows/sec on a single connection.
 ### Tests
 3 new unit tests (the invariant tripwires). Total: **72 unit + 224 integration + 28 benchmark = 324 tests**.
 ### Baseline refreshed
 `tests/benchmarks/baseline.json` updated. All tests pass; ruff clean.
 ## 2026.05.04.9 — Decoder dispatch + struct precompilation (Phase 24)
 Second pass of hot-path optimization. Phase 23 lifted IfxType conversions out of the loop body in `_resultset.py` (-26% on `parse_tuple_5cols`). Phase 24 goes deeper into the codec layer.
--- a/pyproject.toml
+++ b/pyproject.toml
@ -1,6 +1,6 @@
 [project]
 name = "informix-db"
-version = "2026.05.04.9"
+version = "2026.05.04.10"
 description = "Pure-Python driver for IBM Informix IDS — speaks the SQLI wire protocol over raw sockets. No CSDK, no JVM, no native libraries."
 readme = "README.md"
 license = { text = "MIT" }
--- a/src/informix_db/_resultset.py
+++ b/src/informix_db/_resultset.py
@ -223,6 +223,16 @@ _COMPOSITE_UDT_TYPES = frozenset({
 _NUMERIC_TYPES = frozenset({_TC_DECIMAL, _TC_MONEY})
 # Types that are fixed-width on the wire AND have a registered decoder
 # in ``FIXED_WIDTHS``: SMALLINT, INT, SERIAL, SMFLOAT, FLOAT, BIGINT,
 # BIGSERIAL, DATE, BOOL. These are the most common types in any real
 # query, so checking them FIRST in the parse_tuple_payload dispatch
 # saves ~7 frozenset/equality misses per column. Disjoint from every
 # other branch's type set (verified — none of these codes appear in
 # _LENGTH_PREFIXED_SHORT_TYPES, _NUMERIC_TYPES, _COMPOSITE_UDT_TYPES,
 # or as DATETIME/INTERVAL/UDTFIXED/UDTVAR/LVARCHAR).
 _FIXED_WIDTH_TYPES = frozenset(FIXED_WIDTHS.keys())
 def parse_tuple_payload(
    reader: IfxStreamReader,
@ -270,6 +280,18 @@ def parse_tuple_payload(
    for col in columns:
        tc = col.type_code
        # Fast path: fixed-width types (INT, FLOAT, DATE, BIGINT, etc.)
        # are by far the most common columns in real queries. Check them
        # FIRST so we don't pay 7+ branch-misses per integer column.
        # FIXED_WIDTHS keys are disjoint from every other branch's type
        # set — see _FIXED_WIDTH_TYPES module-level comment.
        if tc in _FIXED_WIDTH_TYPES:
            width = FIXED_WIDTHS[tc]
            raw = payload[offset:offset + width]
            offset += width
            values.append(_decode_base(tc, raw, encoding))
            continue
        if tc in _LENGTH_PREFIXED_SHORT_TYPES:
            # In tuple data, VARCHAR/NCHAR/NVCHAR use a SINGLE-BYTE
            # length prefix (max 255 — IDS VARCHAR's hard limit), not
@ -414,11 +436,12 @@ def parse_tuple_payload(
            values.append(raw.decode(encoding))
            continue
-        # Fixed-width types
+        # Unknown / unhandled type fall-through. The fast-path at the
-        width = FIXED_WIDTHS.get(tc)
+        # top of this loop already handled all FIXED_WIDTHS-registered
-        if width is None:
+        # types (INT, FLOAT, DATE, etc.); the explicit branches above
-            # Phase 6+ types (DATETIME, INTERVAL, BLOBs) — fall back
+        # handle every other known wire shape. Anything reaching here
-            # to encoded_length and surface raw bytes.
+        # is a type code we don't recognize — surface ``encoded_length``
        # bytes raw and let the decoder dispatch (or its fallback) react.
        width = col.encoded_length
        raw = payload[offset:offset + width]
        offset += width
--- a/src/informix_db/converters.py
+++ b/src/informix_db/converters.py
@ -509,6 +509,21 @@ def _decode_decimal(raw: bytes) -> decimal.Decimal | None:
 # slice column values out of an SQ_TUPLE payload for fixed-width types.
 # Variable-width types (CHAR, VARCHAR, DECIMAL, etc.) are length-prefixed
 # on the wire and don't appear in this table.
 #
 # INVARIANT — every key here MUST be decodable by
 # ``_decode_base(tc, raw, encoding)`` with NO per-column qualifier
 # inspection (no ``col.encoded_length`` lookup, no extended_id check,
 # no extended_name check). This is **load-bearing for correctness**:
 # ``_resultset.parse_tuple_payload`` dispatches all ``FIXED_WIDTHS``
 # types through a single fast-path branch that does not pass
 # ``col.encoded_length`` to the decoder. If a new fixed-width type
 # needs qualifier bits (the way DATETIME and INTERVAL do — both
 # absent from this table for exactly that reason), give it its own
 # explicit branch in ``parse_tuple_payload`` instead of adding it
 # here. A test in ``tests/test_resultset_invariants.py`` enforces the
 # disjointness of this set against every other dispatch branch's
 # type set; another test enforces that every key here has a decoder
 # in DECODERS.
 FIXED_WIDTHS: dict[int, int] = {
    IfxType.SMALLINT: 2,
    IfxType.INT: 4,
--- a/tests/benchmarks/baseline.json
+++ b/tests/benchmarks/baseline.json
--- a/tests/test_resultset_invariants.py
+++ b/tests/test_resultset_invariants.py
@ -0,0 +1,98 @@
 """Phase 25 — invariant tripwires for parse_tuple_payload's fast-path dispatch.
 These tests don't exercise behavior. They lock down the structural
 invariants the optimized hot loop in :func:`informix_db._resultset.parse_tuple_payload`
 relies on for correctness. Each test is a CI tripwire — if a future
 contributor breaks an invariant, these fail at test time rather than
 at a customer's wire-protocol mismatch six months later.
 Lessons from Margaret Hamilton's review of Phases 23/24/25:
 * The optimization is *correct* — but its correctness depends on
  properties of unrelated tables (DECODERS keys, FIXED_WIDTHS keys,
  IfxType flag bits) staying consistent.
 * A comment at the table only helps if the next contributor reads it.
 * A test fails loudly the moment the invariant is broken. Prefer that.
 If one of these tests fires, **do not** simply update the test to
 match the new state — that defeats the purpose. Instead read the
 docstring on the failed test and the corresponding INVARIANT comment
 in the source; either restore the property or refactor the
 optimization to no longer depend on it.
 """
 from __future__ import annotations
 from informix_db._resultset import (
    _COMPOSITE_UDT_TYPES,
    _FIXED_WIDTH_TYPES,
    _LENGTH_PREFIXED_SHORT_TYPES,
    _NUMERIC_TYPES,
    _TC_DATETIME,
    _TC_INTERVAL,
    _TC_LVARCHAR,
    _TC_UDTFIXED,
    _TC_UDTVAR,
 )
 from informix_db.converters import DECODERS, FIXED_WIDTHS
 def test_fixed_width_types_disjoint_from_other_dispatch_sets() -> None:
    """parse_tuple_payload's fast path is silently wrong if the FIXED_WIDTHS
    type set overlaps any other branch.
    The optimization in ``parse_tuple_payload`` puts the FIXED_WIDTHS
    branch FIRST. If a type is also in (e.g.) _NUMERIC_TYPES, the fast
    path swallows it before the DECIMAL/MONEY-specific handler runs —
    silently producing wrong values.
    If this test fails, you've added a new entry somewhere that
    overlaps. Either move it to FIXED_WIDTHS exclusively (and remove
    its specialized branch) or remove it from FIXED_WIDTHS.
    """
    other_branch_types = (
        _LENGTH_PREFIXED_SHORT_TYPES
        | _NUMERIC_TYPES
        | _COMPOSITE_UDT_TYPES
        | {_TC_LVARCHAR, _TC_DATETIME, _TC_INTERVAL, _TC_UDTFIXED, _TC_UDTVAR}
    )
    overlap = _FIXED_WIDTH_TYPES & other_branch_types
    assert overlap == set(), (
        f"FIXED_WIDTHS overlap with another parse_tuple_payload branch: {overlap}. "
        f"See the INVARIANT comment on FIXED_WIDTHS in converters.py."
    )
 def test_every_fixed_width_type_has_a_decoder() -> None:
    """The fast path calls ``_decode_base(tc, raw, encoding)`` for every
    FIXED_WIDTHS key. If a key has no entry in DECODERS, we'd raise
    ``NotImplementedError`` for that column — surprising the user.
    If this test fails, you've added a key to FIXED_WIDTHS without
    adding a corresponding decoder. Add the decoder, or remove the
    key.
    """
    missing = [tc for tc in FIXED_WIDTHS if tc not in DECODERS]
    assert missing == [], (
        f"FIXED_WIDTHS has keys without DECODERS entries: {missing}. "
        f"Every fixed-width type must be decodable by _decode_base."
    )
 def test_decoders_keys_stay_below_0x100() -> None:
    """The Phase 24 optimization in ``_decode_base`` skips ``base_type()``
    by relying on a structural guarantee: all DECODERS keys are ≤ 0xFF
    and all flag bits in _types.py are ≥ 0x100, so a flagged type code
    cannot coincidentally match a DECODERS key.
    If this test fails, you've added a decoder for a type code with
    bits ≥ 0x100. The collision-free guarantee weakens — re-introduce
    ``base_type()`` inside ``_decode_base`` (and remove the Phase 24
    optimization), OR keep the new key but verify it cannot clash with
    any flagged input.
    """
    high_keys = [tc for tc in DECODERS if tc >= 0x100]
    assert high_keys == [], (
        f"DECODERS contains keys with bits >= 0x100: {high_keys}. "
        f"See the INVARIANT comment on DECODERS in converters.py."
    )
--- a/uv.lock
+++ b/uv.lock
@ -34,7 +34,7 @@ wheels = [
 [[package]]
 name = "informix-db"
-version = "2026.5.4.8"
+version = "2026.5.4.9"
 source = { editable = "." }
 [package.optional-dependencies]