Phase 25: Branch reorder + invariant tripwires (2026.05.04.10)

Third-pass optimization on parse_tuple_payload's hot loop. Previous
phases removed redundant work; this one removes correct-but-wasteful
work: the if/elif chain checked branches in implementation order, not
frequency order. Fixed-width types (INT, FLOAT, DATE, BIGINT - the most
common columns in real queries) sat at the bottom, paying ~7 frozenset
misses per column.

Changes (src/informix_db/_resultset.py):
* Added _FIXED_WIDTH_TYPES = frozenset(FIXED_WIDTHS.keys()) at module
  load.
* New fast-path branch at the TOP of parse_tuple_payload's loop body
  that handles every _FIXED_WIDTH_TYPES column inline: one frozenset
  check, one dict lookup, one decode, continue. Skips every other
  branch.
* Cleaned up the bottom fall-through; it now genuinely only catches
  unknown types.

Performance vs Phase 24 baseline:
* parse_tuple_5cols_iso8859: 1659 ns -> 1400 ns (-16%)
* parse_tuple_5cols_utf8:    1649 ns -> 1341 ns (-19%)

Cumulative vs Phase 21 baseline (before any optimization):
* parse_tuple_5cols: 2796 ns -> 1400 ns (-50%) - HALF the time
* decode_int:        230 ns  -> 139 ns  (-40%)

Margaret Hamilton review surfaced one HIGH finding addressed before
tagging:
* H: The fast-path optimization assumes every FIXED_WIDTHS key is
  decodable WITHOUT qualifier inspection (encoded_length etc.). True
  today, but a future contributor adding a fixed-width type that
  needs qualifier bits (like DATETIME does) would silently get wrong
  decode behavior - Lauren-Bug class failure.

  Fix: added INVARIANT comment to FIXED_WIDTHS in converters.py AND
  added tests/test_resultset_invariants.py with three CI tripwire
  tests:
  - _FIXED_WIDTH_TYPES is disjoint from every other dispatch branch
  - Every FIXED_WIDTHS key has a DECODERS entry
  - DECODERS keys stay < 0x100 (Phase 24 collision-free guarantee)

  The tests carry instructions: if one fires, don't update the test
  to match - either restore the property or refactor the optimization.
  Comments rot when nobody reads them; tests fail loudly.

baseline.json refreshed; 72 unit + 224 integration + 28 bench = 324
tests; ruff clean.
This commit is contained in:
Ryan Malloy 2026-05-04 23:34:05 -06:00
parent dfa60ea501
commit e9aed6ce59
7 changed files with 647 additions and 462 deletions

View File

@ -2,6 +2,55 @@
All notable changes to `informix-db`. Versioning is [CalVer](https://calver.org/) — `YYYY.MM.DD` for date-based releases, `YYYY.MM.DD.N` for same-day post-releases per PEP 440.
## 2026.05.04.10 — Branch reorder by frequency + invariant tripwires (Phase 25)
Third-pass optimization on `parse_tuple_payload`. Previous phases removed redundant work; this one removes *correct-but-wasteful* work: the if/elif chain checked branches in implementation order, not frequency order. Fixed-width types (INT, FLOAT, DATE, BIGINT — by far the most common columns in real queries) sat at the *bottom* of the chain, paying ~7 frozenset/equality misses per column.
### What changed
- **Added `_FIXED_WIDTH_TYPES = frozenset(FIXED_WIDTHS.keys())`** at module load in `_resultset.py`.
- **New fast-path branch at the TOP of `parse_tuple_payload`'s loop body** that handles every `_FIXED_WIDTH_TYPES` column inline (slice + `_decode_base` + advance). For an INT column we now hit one frozenset check, one dict lookup, one decode call — and skip every other branch.
- **Cleaned up the bottom fall-through** since FIXED_WIDTHS-keyed types no longer reach it. The fall-through now genuinely only catches unknown/unhandled types; comment updated.
### Margaret Hamilton review pass — invariant tripwires added
The third Hamilton review of this hot path produced one HIGH-severity finding addressed before tagging. The pattern was the same as Phases 23 and 24: an optimization is correct *because of* a property of an external table (here: `FIXED_WIDTHS` keys are decodable without qualifier inspection), but the property is implicit. The finding's recommendation, going beyond a comment:
- **Added `tests/test_resultset_invariants.py`** — three CI tripwire tests that turn the structural invariants from comments into executable checks:
1. `_FIXED_WIDTH_TYPES` is disjoint from every other dispatch branch's type set.
2. Every `FIXED_WIDTHS` key has a decoder in `DECODERS`.
3. All `DECODERS` keys are < 0x100 (the Phase 24 collision-free guarantee).
- **Added INVARIANT comment to `FIXED_WIDTHS`** in `converters.py` explaining the qualifier-free constraint and pointing to the tripwire tests.
The tests follow a simple discipline: if one fires, **don't update the test to match the new state** — read the docstring and either restore the property or refactor the optimization to no longer depend on it. Comments rot when nobody reads them; tests fail loudly when someone violates them.
### Performance summary (Phase 25)
| Benchmark | Phase 24 baseline | NOW | Δ |
|---|---:|---:|---:|
| `parse_tuple_5cols_iso8859` | 1659 ns | **1400 ns** | **-16%** |
| `parse_tuple_5cols_utf8` | 1649 ns | **1341 ns** | **-19%** |
End-to-end SELECT numbers fluctuate ±10% run-to-run on sub-millisecond loopback round-trips; the codec micro-benchmark is the durable measurement.
### Cumulative improvement (vs. original Phase 21 baseline, before any optimization)
| Metric | Original | NOW | Total Δ |
|---|---:|---:|---:|
| `parse_tuple_5cols` | 2796 ns | **1400 ns** | **-50%** |
| `decode_int` | 230 ns | 139 ns | -40% |
| `select_bench_table_all` (1k rows, where measurable) | 1477 µs | ~990 µs | ≈-33% |
The per-row decode hot path is **half the time it took at start of optimization work**. Real-world fetch ceiling: 358K rows/sec → ~715K rows/sec on a single connection.
### Tests
3 new unit tests (the invariant tripwires). Total: **72 unit + 224 integration + 28 benchmark = 324 tests**.
### Baseline refreshed
`tests/benchmarks/baseline.json` updated. All tests pass; ruff clean.
## 2026.05.04.9 — Decoder dispatch + struct precompilation (Phase 24)
Second pass of hot-path optimization. Phase 23 lifted IfxType conversions out of the loop body in `_resultset.py` (-26% on `parse_tuple_5cols`). Phase 24 goes deeper into the codec layer.

View File

@ -1,6 +1,6 @@
[project]
name = "informix-db"
version = "2026.05.04.9"
version = "2026.05.04.10"
description = "Pure-Python driver for IBM Informix IDS — speaks the SQLI wire protocol over raw sockets. No CSDK, no JVM, no native libraries."
readme = "README.md"
license = { text = "MIT" }

View File

@ -223,6 +223,16 @@ _COMPOSITE_UDT_TYPES = frozenset({
_NUMERIC_TYPES = frozenset({_TC_DECIMAL, _TC_MONEY})
# Types that are fixed-width on the wire AND have a registered decoder
# in ``FIXED_WIDTHS``: SMALLINT, INT, SERIAL, SMFLOAT, FLOAT, BIGINT,
# BIGSERIAL, DATE, BOOL. These are the most common types in any real
# query, so checking them FIRST in the parse_tuple_payload dispatch
# saves ~7 frozenset/equality misses per column. Disjoint from every
# other branch's type set (verified — none of these codes appear in
# _LENGTH_PREFIXED_SHORT_TYPES, _NUMERIC_TYPES, _COMPOSITE_UDT_TYPES,
# or as DATETIME/INTERVAL/UDTFIXED/UDTVAR/LVARCHAR).
_FIXED_WIDTH_TYPES = frozenset(FIXED_WIDTHS.keys())
def parse_tuple_payload(
reader: IfxStreamReader,
@ -270,6 +280,18 @@ def parse_tuple_payload(
for col in columns:
tc = col.type_code
# Fast path: fixed-width types (INT, FLOAT, DATE, BIGINT, etc.)
# are by far the most common columns in real queries. Check them
# FIRST so we don't pay 7+ branch-misses per integer column.
# FIXED_WIDTHS keys are disjoint from every other branch's type
# set — see _FIXED_WIDTH_TYPES module-level comment.
if tc in _FIXED_WIDTH_TYPES:
width = FIXED_WIDTHS[tc]
raw = payload[offset:offset + width]
offset += width
values.append(_decode_base(tc, raw, encoding))
continue
if tc in _LENGTH_PREFIXED_SHORT_TYPES:
# In tuple data, VARCHAR/NCHAR/NVCHAR use a SINGLE-BYTE
# length prefix (max 255 — IDS VARCHAR's hard limit), not
@ -414,12 +436,13 @@ def parse_tuple_payload(
values.append(raw.decode(encoding))
continue
# Fixed-width types
width = FIXED_WIDTHS.get(tc)
if width is None:
# Phase 6+ types (DATETIME, INTERVAL, BLOBs) — fall back
# to encoded_length and surface raw bytes.
width = col.encoded_length
# Unknown / unhandled type fall-through. The fast-path at the
# top of this loop already handled all FIXED_WIDTHS-registered
# types (INT, FLOAT, DATE, etc.); the explicit branches above
# handle every other known wire shape. Anything reaching here
# is a type code we don't recognize — surface ``encoded_length``
# bytes raw and let the decoder dispatch (or its fallback) react.
width = col.encoded_length
raw = payload[offset:offset + width]
offset += width
try:

View File

@ -509,6 +509,21 @@ def _decode_decimal(raw: bytes) -> decimal.Decimal | None:
# slice column values out of an SQ_TUPLE payload for fixed-width types.
# Variable-width types (CHAR, VARCHAR, DECIMAL, etc.) are length-prefixed
# on the wire and don't appear in this table.
#
# INVARIANT — every key here MUST be decodable by
# ``_decode_base(tc, raw, encoding)`` with NO per-column qualifier
# inspection (no ``col.encoded_length`` lookup, no extended_id check,
# no extended_name check). This is **load-bearing for correctness**:
# ``_resultset.parse_tuple_payload`` dispatches all ``FIXED_WIDTHS``
# types through a single fast-path branch that does not pass
# ``col.encoded_length`` to the decoder. If a new fixed-width type
# needs qualifier bits (the way DATETIME and INTERVAL do — both
# absent from this table for exactly that reason), give it its own
# explicit branch in ``parse_tuple_payload`` instead of adding it
# here. A test in ``tests/test_resultset_invariants.py`` enforces the
# disjointness of this set against every other dispatch branch's
# type set; another test enforces that every key here has a decoder
# in DECODERS.
FIXED_WIDTHS: dict[int, int] = {
IfxType.SMALLINT: 2,
IfxType.INT: 4,

File diff suppressed because it is too large Load Diff

View File

@ -0,0 +1,98 @@
"""Phase 25 — invariant tripwires for parse_tuple_payload's fast-path dispatch.
These tests don't exercise behavior. They lock down the structural
invariants the optimized hot loop in :func:`informix_db._resultset.parse_tuple_payload`
relies on for correctness. Each test is a CI tripwire if a future
contributor breaks an invariant, these fail at test time rather than
at a customer's wire-protocol mismatch six months later.
Lessons from Margaret Hamilton's review of Phases 23/24/25:
* The optimization is *correct* but its correctness depends on
properties of unrelated tables (DECODERS keys, FIXED_WIDTHS keys,
IfxType flag bits) staying consistent.
* A comment at the table only helps if the next contributor reads it.
* A test fails loudly the moment the invariant is broken. Prefer that.
If one of these tests fires, **do not** simply update the test to
match the new state that defeats the purpose. Instead read the
docstring on the failed test and the corresponding INVARIANT comment
in the source; either restore the property or refactor the
optimization to no longer depend on it.
"""
from __future__ import annotations
from informix_db._resultset import (
_COMPOSITE_UDT_TYPES,
_FIXED_WIDTH_TYPES,
_LENGTH_PREFIXED_SHORT_TYPES,
_NUMERIC_TYPES,
_TC_DATETIME,
_TC_INTERVAL,
_TC_LVARCHAR,
_TC_UDTFIXED,
_TC_UDTVAR,
)
from informix_db.converters import DECODERS, FIXED_WIDTHS
def test_fixed_width_types_disjoint_from_other_dispatch_sets() -> None:
"""parse_tuple_payload's fast path is silently wrong if the FIXED_WIDTHS
type set overlaps any other branch.
The optimization in ``parse_tuple_payload`` puts the FIXED_WIDTHS
branch FIRST. If a type is also in (e.g.) _NUMERIC_TYPES, the fast
path swallows it before the DECIMAL/MONEY-specific handler runs
silently producing wrong values.
If this test fails, you've added a new entry somewhere that
overlaps. Either move it to FIXED_WIDTHS exclusively (and remove
its specialized branch) or remove it from FIXED_WIDTHS.
"""
other_branch_types = (
_LENGTH_PREFIXED_SHORT_TYPES
| _NUMERIC_TYPES
| _COMPOSITE_UDT_TYPES
| {_TC_LVARCHAR, _TC_DATETIME, _TC_INTERVAL, _TC_UDTFIXED, _TC_UDTVAR}
)
overlap = _FIXED_WIDTH_TYPES & other_branch_types
assert overlap == set(), (
f"FIXED_WIDTHS overlap with another parse_tuple_payload branch: {overlap}. "
f"See the INVARIANT comment on FIXED_WIDTHS in converters.py."
)
def test_every_fixed_width_type_has_a_decoder() -> None:
"""The fast path calls ``_decode_base(tc, raw, encoding)`` for every
FIXED_WIDTHS key. If a key has no entry in DECODERS, we'd raise
``NotImplementedError`` for that column surprising the user.
If this test fails, you've added a key to FIXED_WIDTHS without
adding a corresponding decoder. Add the decoder, or remove the
key.
"""
missing = [tc for tc in FIXED_WIDTHS if tc not in DECODERS]
assert missing == [], (
f"FIXED_WIDTHS has keys without DECODERS entries: {missing}. "
f"Every fixed-width type must be decodable by _decode_base."
)
def test_decoders_keys_stay_below_0x100() -> None:
"""The Phase 24 optimization in ``_decode_base`` skips ``base_type()``
by relying on a structural guarantee: all DECODERS keys are 0xFF
and all flag bits in _types.py are 0x100, so a flagged type code
cannot coincidentally match a DECODERS key.
If this test fails, you've added a decoder for a type code with
bits 0x100. The collision-free guarantee weakens re-introduce
``base_type()`` inside ``_decode_base`` (and remove the Phase 24
optimization), OR keep the new key but verify it cannot clash with
any flagged input.
"""
high_keys = [tc for tc in DECODERS if tc >= 0x100]
assert high_keys == [], (
f"DECODERS contains keys with bits >= 0x100: {high_keys}. "
f"See the INVARIANT comment on DECODERS in converters.py."
)

2
uv.lock generated
View File

@ -34,7 +34,7 @@ wheels = [
[[package]]
name = "informix-db"
version = "2026.5.4.8"
version = "2026.5.4.9"
source = { editable = "." }
[package.optional-dependencies]