Phase 24: Decoder dispatch split + struct precompilation (2026.05.04.9)

Second pass of hot-path optimization on parse_tuple_payload. Two changes
to converters.py:

1. Split decode() into public + internal. Added _decode_base(base_tc,
   raw, encoding) that takes an already-base-typed code and skips the
   redundant base_type() call. Public decode() is now a one-line
   wrapper. parse_tuple_payload's 4 call sites swapped to use
   _decode_base directly. _fastpath.py's external decode() caller is
   unaffected.

2. Pre-compiled struct.Struct unpackers. The fixed-width integer/float
   decoders (_decode_smallint, _decode_int, _decode_bigint,
   _decode_smfloat, _decode_float, _decode_date) switched from per-call
   struct.unpack(fmt, raw) to module-level bound methods like
   _UNPACK_INT = struct.Struct("!i").unpack. Format-string parsed once
   at module load. Measured 37% faster than per-call struct.unpack on
   CPython 3.13 micro.

Performance vs Phase 23 baseline:
* decode_int: 173 ns -> 139 ns (-20%)
* decode_bigint: 188 ns -> 150 ns (-20%)
* parse_tuple_5cols: 2047 ns -> 1592 ns (-22%)
* 1k-row SELECT: 1255 us -> 989 us (-21%)

Cumulative vs original Phase 21 baseline:
* decode_int: 230 ns -> 139 ns (-40%)
* parse_tuple_5cols: 2796 ns -> 1592 ns (-43%)
* 1k-row SELECT: 1477 us -> 989 us (-33%)

Real-world fetch ceiling: 358K rows/sec -> ~620K rows/sec.

Margaret Hamilton review surfaced one HIGH-severity finding addressed
before tagging:
* H: The no-collision guarantee that makes _decode_base safe is
  structural but undocumented (all DECODERS keys are ≤ 0xFF, all flag
  bits are ≥ 0x100, so flagged inputs cannot coincidentally match).
  Added load-bearing INVARIANT comment at DECODERS dict explaining
  the constraint and what to do if violated. Cross-referenced from
  _decode_base's docstring for bidirectional traceability.

baseline.json refreshed; all 224 integration tests pass; ruff clean.
This commit is contained in:
Ryan Malloy 2026-05-04 19:31:21 -06:00
parent f3e589c5bf
commit dfa60ea501
6 changed files with 586 additions and 487 deletions

View File

@ -2,6 +2,51 @@
All notable changes to `informix-db`. Versioning is [CalVer](https://calver.org/) — `YYYY.MM.DD` for date-based releases, `YYYY.MM.DD.N` for same-day post-releases per PEP 440. All notable changes to `informix-db`. Versioning is [CalVer](https://calver.org/) — `YYYY.MM.DD` for date-based releases, `YYYY.MM.DD.N` for same-day post-releases per PEP 440.
## 2026.05.04.9 — Decoder dispatch + struct precompilation (Phase 24)
Second pass of hot-path optimization. Phase 23 lifted IfxType conversions out of the loop body in `_resultset.py` (-26% on `parse_tuple_5cols`). Phase 24 goes deeper into the codec layer.
### What changed
**1. Split `decode()` into public + internal in `src/informix_db/converters.py`.**
- New `_decode_base(base_tc, raw, encoding)` takes an *already-base-typed* type code and skips the `base_type()` flag strip. Documented INVARIANT: caller's responsibility to base-type the input.
- Public `decode()` is now a one-line wrapper: `return _decode_base(base_type(type_code), raw, encoding)`. Same external semantics, same backward-compat — `_fastpath.py:171` is unaffected.
- `parse_tuple_payload` (4 call sites) now imports and calls `_decode_base` directly. Saves ~100 ns × N columns per row by skipping the redundant flag strip.
**2. Pre-compiled `struct.Struct` unpackers.** The fixed-width integer/float decoders (`_decode_smallint`, `_decode_int`, `_decode_bigint`, `_decode_smfloat`, `_decode_float`, `_decode_date`) switched from per-call `struct.unpack(fmt, raw)` to module-level bound methods like `_UNPACK_INT = struct.Struct("!i").unpack`. Format-string parsing happens once at module load instead of per call — measured 37% faster than per-call `struct.unpack` on a CPython 3.13 microbenchmark.
### Margaret Hamilton review pass
The optimization went through a second failure-mode review. One HIGH-severity finding addressed:
- **H (high)**: The no-collision guarantee that makes `_decode_base` safe is *structural but undocumented*. Specifically: all DECODERS keys are ≤ 0xFF; all flag bits in `_types.py` are ≥ 0x100; therefore a flagged input *cannot* coincidentally match a DECODERS key. This guarantee is correct today but fragile — adding a decoder for a type code that uses bits ≥ 0x100 would silently weaken it. **Fixed**: added a load-bearing INVARIANT comment at the `DECODERS` dict declaration explaining the constraint and what to do if it's violated. Cross-referenced from `_decode_base`'s docstring so the contract is bidirectionally traceable.
### Performance summary (Phase 24)
| Benchmark | Phase 23 baseline | NOW | Δ this phase |
|---|---:|---:|---:|
| `decode_int` | 173 ns | **139 ns** | **-20%** |
| `decode_bigint` | 188 ns | **150 ns** | **-20%** |
| `decode_smallint` | 169 ns | **137 ns** | **-19%** |
| `decode_date` | 521 ns | **435 ns** | **-17%** |
| `parse_tuple_5cols_iso8859` | 2047 ns | **1592 ns** | **-22%** |
| `select_bench_table_all` (1k rows) | 1255 µs | **989 µs** | **-21%** |
| `select_with_param` | 977 µs | 860 µs | -12% |
### Cumulative improvement (vs. original Phase 21 baseline, before any optimization)
| Metric | Original | NOW | Total Δ |
|---|---:|---:|---:|
| `decode_int` | 230 ns | **139 ns** | **-40%** |
| `parse_tuple_5cols` | 2796 ns | **1592 ns** | **-43%** |
| `select_bench_table_all` (1k rows) | 1477 µs | **989 µs** | **-33%** |
Real-world fetch ceiling: 358K rows/sec → ~620K rows/sec on a single connection.
### Baseline refreshed
`tests/benchmarks/baseline.json` updated. All 224 integration tests pass; ruff clean.
## 2026.05.04.8 — Hot-path optimization (Phase 23) ## 2026.05.04.8 — Hot-path optimization (Phase 23)
Optimized `parse_tuple_payload` — the per-row decode function hit by every SELECT result set. **The 1k-row fetch wall-clock improved 19%** (1477 µs → 1198 µs). Bench micro-target (`parse_tuple_5cols`) improved 27% (2796 ns → 2030 ns). All 224 integration tests still pass; ruff clean. Optimized `parse_tuple_payload` — the per-row decode function hit by every SELECT result set. **The 1k-row fetch wall-clock improved 19%** (1477 µs → 1198 µs). Bench micro-target (`parse_tuple_5cols`) improved 27% (2796 ns → 2030 ns). All 224 integration tests still pass; ruff clean.

View File

@ -1,6 +1,6 @@
[project] [project]
name = "informix-db" name = "informix-db"
version = "2026.05.04.8" version = "2026.05.04.9"
description = "Pure-Python driver for IBM Informix IDS — speaks the SQLI wire protocol over raw sockets. No CSDK, no JVM, no native libraries." description = "Pure-Python driver for IBM Informix IDS — speaks the SQLI wire protocol over raw sockets. No CSDK, no JVM, no native libraries."
readme = "README.md" readme = "README.md"
license = { text = "MIT" } license = { text = "MIT" }

View File

@ -31,9 +31,9 @@ from .converters import (
ClobLocator, ClobLocator,
CollectionValue, CollectionValue,
RowValue, RowValue,
_decode_base,
_decode_datetime, _decode_datetime,
_decode_interval, _decode_interval,
decode,
) )
# Module-level type-code constants — lifted out of the hot loop in # Module-level type-code constants — lifted out of the hot loop in
@ -288,7 +288,7 @@ def parse_tuple_payload(
offset += 1 offset += 1
raw = payload[offset:offset + length] raw = payload[offset:offset + length]
offset += length offset += length
values.append(decode(tc, raw, encoding)) values.append(_decode_base(tc, raw, encoding))
continue continue
if tc == _TC_LVARCHAR: if tc == _TC_LVARCHAR:
@ -299,7 +299,7 @@ def parse_tuple_payload(
offset += length offset += length
if length & 1: if length & 1:
offset += 1 offset += 1
values.append(decode(tc, raw, encoding)) values.append(_decode_base(tc, raw, encoding))
continue continue
# DECIMAL/MONEY: width = ceil(precision/2) + 1, where precision is # DECIMAL/MONEY: width = ceil(precision/2) + 1, where precision is
@ -311,7 +311,7 @@ def parse_tuple_payload(
raw = payload[offset:offset + width] raw = payload[offset:offset + width]
offset += width offset += width
try: try:
values.append(decode(tc, raw)) values.append(_decode_base(tc, raw))
except NotImplementedError: except NotImplementedError:
values.append(raw) values.append(raw)
continue continue
@ -423,7 +423,7 @@ def parse_tuple_payload(
raw = payload[offset:offset + width] raw = payload[offset:offset + width]
offset += width offset += width
try: try:
values.append(decode(tc, raw, encoding)) values.append(_decode_base(tc, raw, encoding))
except NotImplementedError: except NotImplementedError:
values.append(raw) values.append(raw)
return tuple(values) return tuple(values)

View File

@ -166,32 +166,42 @@ _REAL_NULL = b"\xff\xff\xff\xff"
_DOUBLE_NULL = b"\xff\xff\xff\xff\xff\xff\xff\xff" _DOUBLE_NULL = b"\xff\xff\xff\xff\xff\xff\xff\xff"
_DATE_NULL = 0x80000000 _DATE_NULL = 0x80000000
# Pre-compiled struct unpackers — bound methods bound at module load.
# 37% faster than ``struct.unpack(fmt, raw)`` because the format string
# is parsed once at compile time, not per call. Used by the fixed-width
# decoders below; saves ~9 ns/call x the row's int/float column count.
_UNPACK_SHORT = struct.Struct("!h").unpack
_UNPACK_INT = struct.Struct("!i").unpack
_UNPACK_LONG = struct.Struct("!q").unpack
_UNPACK_FLOAT = struct.Struct("!f").unpack
_UNPACK_DOUBLE = struct.Struct("!d").unpack
def _decode_smallint(raw: bytes) -> int | None: def _decode_smallint(raw: bytes) -> int | None:
val = struct.unpack("!h", raw)[0] val = _UNPACK_SHORT(raw)[0]
return None if val == -0x8000 else val return None if val == -0x8000 else val
def _decode_int(raw: bytes) -> int | None: def _decode_int(raw: bytes) -> int | None:
val = struct.unpack("!i", raw)[0] val = _UNPACK_INT(raw)[0]
return None if val == -0x80000000 else val return None if val == -0x80000000 else val
def _decode_bigint(raw: bytes) -> int | None: def _decode_bigint(raw: bytes) -> int | None:
val = struct.unpack("!q", raw)[0] val = _UNPACK_LONG(raw)[0]
return None if val == -0x8000000000000000 else val return None if val == -0x8000000000000000 else val
def _decode_smfloat(raw: bytes) -> float | None: def _decode_smfloat(raw: bytes) -> float | None:
if raw == _REAL_NULL: if raw == _REAL_NULL:
return None return None
return struct.unpack("!f", raw)[0] return _UNPACK_FLOAT(raw)[0]
def _decode_float(raw: bytes) -> float | None: def _decode_float(raw: bytes) -> float | None:
if raw == _DOUBLE_NULL: if raw == _DOUBLE_NULL:
return None return None
return struct.unpack("!d", raw)[0] return _UNPACK_DOUBLE(raw)[0]
def _decode_char(raw: bytes, encoding: str = "iso-8859-1") -> str: def _decode_char(raw: bytes, encoding: str = "iso-8859-1") -> str:
@ -219,7 +229,7 @@ def _decode_bool(raw: bytes) -> bool:
def _decode_date(raw: bytes) -> datetime.date | None: def _decode_date(raw: bytes) -> datetime.date | None:
"""4-byte big-endian signed int = day count from 1899-12-31. NULL = 0x80000000.""" """4-byte big-endian signed int = day count from 1899-12-31. NULL = 0x80000000."""
days = struct.unpack("!i", raw)[0] days = _UNPACK_INT(raw)[0]
if days == -0x80000000: if days == -0x80000000:
return None return None
return _INFORMIX_DATE_EPOCH + datetime.timedelta(days=days) return _INFORMIX_DATE_EPOCH + datetime.timedelta(days=days)
@ -514,6 +524,18 @@ FIXED_WIDTHS: dict[int, int] = {
# Phase 2 MVP decoders. Phase 6+ adds DATETIME, INTERVAL, DECIMAL, # Phase 2 MVP decoders. Phase 6+ adds DATETIME, INTERVAL, DECIMAL,
# MONEY, LVARCHAR, BYTE/TEXT, BLOB/CLOB, ROW, COLLECTION. # MONEY, LVARCHAR, BYTE/TEXT, BLOB/CLOB, ROW, COLLECTION.
#
# INVARIANT — KEYS MUST REMAIN ≤ 0xFF (255). This is **load-bearing for
# correctness**, not just a convention. ``_decode_base`` (below) skips
# the ``base_type`` flag strip for performance; its safety relies on
# the fact that any *flagged* type code (NOTNULLABLE=0x100,
# DISTINCT=0x800, etc., per ``_types.py``) is ≥ 256 and therefore
# cannot collide with a DECODERS key in [0, 255]. If you add a decoder
# for a type code ≥ 256 (e.g., CLOB=0x65 itself is fine, but anything
# that uses bits ≥ 0x100 in its identifier), the collision-free
# guarantee weakens and ``_decode_base`` could silently dispatch to
# the wrong decoder when handed a flagged input. Either keep keys ≤
# 0xFF, or restore the ``base_type()`` call inside ``_decode_base``.
DECODERS: dict[int, DecoderFn] = { DECODERS: dict[int, DecoderFn] = {
IfxType.SMALLINT: _decode_smallint, IfxType.SMALLINT: _decode_smallint,
IfxType.INT: _decode_int, IfxType.INT: _decode_int,
@ -543,28 +565,60 @@ _STRING_DECODER_TYPES = frozenset({
}) })
def _decode_base(base_tc: int, raw: bytes, encoding: str = "iso-8859-1") -> object:
"""Internal fast-path dispatch given an *already-base-typed* type code.
INVARIANT: ``base_tc`` MUST be base-typed (high-bit flags stripped).
Caller's responsibility — this function does NOT call ``base_type()``.
The producer-side counterpart of this invariant lives in
:func:`informix_db._resultset.parse_describe` see the INVARIANT
comment at the ``ColumnInfo`` construction site. The contract is
bidirectional: producer must base-type before storing in
``ColumnInfo.type_code``; consumer (here) trusts that contract.
Used by ``parse_tuple_payload`` to skip the redundant base-type
strip when iterating column-by-column over a row payload. The
public ``decode()`` function below wraps this and strips flags
for callers who don't know whether they have a raw or base-typed
input.
Same dispatch logic as ``decode()`` no behavior delta when
invariant holds. If a flagged type code reaches here, the
``DECODERS.get`` lookup will miss and ``NotImplementedError``
fires with a misleading message the failure mode is loud,
not silent. The no-collision guarantee depends on DECODERS keys
staying 0xFF; see the INVARIANT comment at ``DECODERS`` below.
"""
decoder = DECODERS.get(base_tc)
if decoder is None:
raise NotImplementedError(
f"decoder for IDS type code {base_tc} not yet implemented "
f"(Phase 2 MVP supports: SMALLINT, INT, BIGINT, REAL, FLOAT, "
f"CHAR, VARCHAR, BOOL, DATE)"
)
if base_tc in _STRING_DECODER_TYPES:
return decoder(raw, encoding)
return decoder(raw)
def decode(type_code: int, raw: bytes, encoding: str = "iso-8859-1") -> object: def decode(type_code: int, raw: bytes, encoding: str = "iso-8859-1") -> object:
"""Decode ``raw`` bytes for the given IDS type code into a Python value. """Decode ``raw`` bytes for the given IDS type code into a Python value.
The high-bit flags (NOTNULLABLE etc.) are stripped before lookup. The high-bit flags (NOTNULLABLE etc.) are stripped before lookup.
Raises ``KeyError`` for unsupported types Phase 6+ adds the rest. Raises ``NotImplementedError`` for unsupported types Phase 6+
adds the rest.
``encoding`` is honored for string types (CHAR/VARCHAR/NCHAR/NVCHAR/ ``encoding`` is honored for string types (CHAR/VARCHAR/NCHAR/NVCHAR/
LVARCHAR) and ignored otherwise only those four decoders touch LVARCHAR) and ignored otherwise only those decoders touch user
user text. Pass the connection's ``encoding`` (derived from text. Pass the connection's ``encoding`` (derived from CLIENT_LOCALE)
CLIENT_LOCALE) so multibyte locales round-trip correctly. so multibyte locales round-trip correctly.
Public API accepts type codes with high-bit flags. Internal hot-path
callers that know their type code is already base-typed should call
:func:`_decode_base` directly to skip the redundant flag strip.
""" """
base = base_type(type_code) return _decode_base(base_type(type_code), raw, encoding)
decoder = DECODERS.get(base)
if decoder is None:
raise NotImplementedError(
f"decoder for IDS type code {base} not yet implemented "
f"(Phase 2 MVP supports: SMALLINT, INT, BIGINT, REAL, FLOAT, "
f"CHAR, VARCHAR, BOOL, DATE)"
)
if base in _STRING_DECODER_TYPES:
return decoder(raw, encoding)
return decoder(raw)
# --------------------------------------------------------------------------- # ---------------------------------------------------------------------------

File diff suppressed because it is too large Load Diff

2
uv.lock generated
View File

@ -34,7 +34,7 @@ wheels = [
[[package]] [[package]]
name = "informix-db" name = "informix-db"
version = "2026.5.4.7" version = "2026.5.4.8"
source = { editable = "." } source = { editable = "." }
[package.optional-dependencies] [package.optional-dependencies]