Phase 36: IfxPy scaling comparison + honest comparison numbers (2026.05.05.9)

Extends the IfxPy comparison bench script with scaling workloads (1k/10k/100k rows for both executemany and SELECT). Re-runs the full comparison with consistent measurement methodology and updates the README with the actually-correct numbers. Earlier comparison runs reported informix-db winning all 5 benchmarks. Re-running select_bench_table_all with consistent measurement gives 3.04 ms, not the 891 us I cited earlier - a 3.4x discrepancy attributable to noisy warmup + small-fixture artifacts. The "we win everything" framing was wrong. Corrected comparison reveals two clear stories: Bulk-insert: pure-Python wins 1.6x at scale. executemany(10k): IfxPy 259ms -> us 161ms (1.6x faster) executemany(100k): IfxPy 2376ms -> us 1487ms (1.6x faster) Reason: Phase 33's pipelining eliminates per-row RTT. IfxPy's per-call API can't pipeline. Large-fetch: IfxPy wins 2.3-2.4x at scale. SELECT 1k rows: IfxPy 1.2ms / us 2.7ms (IfxPy 2.3x) SELECT 10k rows: IfxPy 11.3ms / us 25.8ms (IfxPy 2.3x) SELECT 100k rows: IfxPy 112ms / us 271ms (IfxPy 2.4x) Reason: C-level fetch_tuple at ~1.1us/row beats Python parse_tuple_payload at ~2.7us/row. Real C-vs-Python codec gap showing up at scale. For everyday workloads (single SELECT in a request, INSERT a handful of rows), drivers are within 5-25%. For workloads where the gap widens, direction depends on what you're doing - bulk- write favors us, bulk-read favors IfxPy. README's "Compared to IfxPy" section rewritten with the corrected numbers and an honest "when to prefer which" subsection. tests/benchmarks/compare/README.md mirror updated. Net narrative: a "faster at bulk-write, slower at bulk-read, comparable elsewhere" comparison story is more honest and more durable than a "we win everything" claim that would have collapsed the first time a user ran their own benchmark. Side note (lint): one ambiguous unicode `×` in cursors.py replaced with `x`. Phase 37 ticket: parse_tuple_payload is the bottleneck at scale. Closing the 1.6 us/row gap to IfxPy would make us competitive on bulk-fetch too. Possible approaches: Cython codec, deeper inlining, per-column dispatch pre-bake.
2026-05-05 12:44:52 -06:00 · 2026-05-05 12:44:52 -06:00 · 270155d2de
commit 270155d2de
parent 8eb19f7534
7 changed files with 199 additions and 20 deletions
--- a/CHANGELOG.md
+++ b/CHANGELOG.md
@ -2,6 +2,49 @@
 All notable changes to `informix-db`. Versioning is [CalVer](https://calver.org/) — `YYYY.MM.DD` for date-based releases, `YYYY.MM.DD.N` for same-day post-releases per PEP 440.
 ## 2026.05.05.9 — IfxPy scaling comparison + honest comparison numbers (Phase 36)
 Adds the IfxPy side of Phase 34's scaling benchmarks (1k / 10k / 100k rows for both `executemany` and `SELECT`) and updates the README's comparison table with the **actually-correct numbers**.
 ### What changed
 **1. `tests/benchmarks/compare/ifxpy_bench.py` extended** with `bench_executemany_scaling(n)` and `bench_select_scaling(n)` — same shapes as `test_scaling_perf.py` so the comparison is apples-to-apples.
 **2. README's comparison numbers corrected.** Earlier comparison runs reported `select_bench_table_all` at 891 µs for `informix-db`. Re-running with consistent measurement (warmup + median + 10+ rounds) reports 3.04 ms — a 3.4× discrepancy. The earlier number was probably picked up from a noisy first-run with a different warmup state, or from a benchmark that wasn't fully populating its fixture. **Either way, the "we win all 5 benchmarks" claim was based on inconsistent measurement.**
 **The corrected comparison reveals two clear stories:**
 | Benchmark | IfxPy | informix-db | Result |
 |---|---:|---:|---|
 | `executemany(1k)` in txn | 23.5 ms | 23.2 ms | tied |
 | `executemany(10k)` in txn | 259 ms | **161 ms** | **us 1.6× faster** |
 | `executemany(100k)` in txn | 2376 ms | **1487 ms** | **us 1.6× faster** |
 | `SELECT 1k rows` | 1.2 ms | 2.7 ms | IfxPy 2.3× faster |
 | `SELECT 10k rows` | 11.3 ms | 25.8 ms | IfxPy 2.3× faster |
 | `SELECT 100k rows` | 112 ms | 271 ms | IfxPy 2.4× faster |
 **Bulk-insert: pure-Python wins 1.6× at scale** because pipelining (Phase 33) eliminates per-row RTT. IfxPy's `IfxPy.execute(stmt, tuple)` per-call API can't pipeline.
 **Large-fetch: IfxPy wins 2.3-2.4× at scale.** Their C-level `fetch_tuple` decoder runs at ~1.1 µs/row; our `parse_tuple_payload` runs at ~2.7 µs/row. **This is the real C-vs-Python codec cost showing up at scale where it matters.**
 ### Why correcting this matters
 A "we win everything" claim that's based on noisy measurements would have collapsed the first time a user ran their own benchmark and got different numbers. Naming the trade-off honestly — "we're faster at bulk write, slower at bulk read, comparable elsewhere" — is the right framing.
 ### When to prefer `informix-db`
 - ETL pipelines, log shipping, bulk writes (1.6× faster at scale)
 - Containerized / minimal-dependency environments (50 KB wheel vs IfxPy's 92 MB OneDB tarball + libcrypt.so.1 dependency hell)
 - Modern Python (works on 3.10–3.14; IfxPy is broken on Python 3.12+)
 - Async / FastAPI workloads (we have native async; IfxPy doesn't)
 ### When IfxPy may be faster
 - Analytical reporting queries pulling 10k+ rows in a single SELECT
 - Workloads where the per-row decode cost dominates (wide rows, tight read loops)
 The actionable takeaway for `informix-db`'s future: the parse_tuple_payload hot path is now the bottleneck at scale. Phase 25's branch reorder shaved 22%; further work (Cython codec? deeper inlining? per-column dispatch pre-bake?) could close the C-vs-Python gap. Tracked as a possible Phase 37+.
 ## 2026.05.05.8 — Scaling benchmarks (Phase 34)
 Adds `tests/benchmarks/test_scaling_perf.py` — parametrized benchmarks that exercise the driver at row counts and column widths well beyond what the existing 1k-row benchmarks cover. The first thing this suite did was catch the NFETCH-loop data-loss bug fixed in Phase 35.
--- a/README.md
+++ b/README.md
@ -176,17 +176,33 @@ Head-to-head benchmarks against [IfxPy](https://pypi.org/project/IfxPy/) on iden
 | Benchmark | IfxPy 3.0.5 (C-bound) | `informix-db` (pure Python) | Result |
 |---|---:|---:|---:|
-| Single-row SELECT round-trip | 118 µs | **114 µs** | **`informix-db` 3% faster** |
+| Single-row SELECT round-trip | 118 µs | 114 µs | comparable |
-| ~10-row server-side query | 164 µs | **159 µs** | **`informix-db` 3% faster** |
+| ~10-row server-side query | 130 µs | 159 µs | IfxPy 22% faster |
-| 1000-row SELECT (full fetch) | 984 µs | **891 µs** | **`informix-db` 9% faster** |
+| Cold connect (login handshake) | 11.0 ms | 10.5 ms | comparable |
-| **`executemany(1000)` in transaction** | 21.4 ms | **10.4 ms** | **`informix-db` 2.05× faster** |
+| **`executemany(1k)` in transaction** | 23.5 ms | 23.2 ms | tied |
-| Cold connect (login handshake) | 11.0 ms | **10.4 ms** | **`informix-db` 5% faster** |
+| **`executemany(10k)` in transaction** | 259 ms | **161 ms** | **`informix-db` 1.6× faster** |
 | **`executemany(100k)` in transaction** | 2376 ms | **1487 ms** | **`informix-db` 1.6× faster** |
 | `SELECT` 1k rows | 1.2 ms | 2.7 ms | IfxPy 2.3× faster |
 | `SELECT` 10k rows | 11.3 ms | 25.8 ms | IfxPy 2.3× faster |
 | `SELECT` 100k rows | 112 ms | 271 ms | IfxPy 2.4× faster |
-**`informix-db` wins on all 5 benchmarks against the C-bound driver, including a 2× win on bulk inserts.**
+**The honest summary:**
-**Why pure-Python wins the round-trip-bound work:** IfxPy's code path is `Python → OneDB ODBC driver → libifdmr.so → wire`. Ours is `Python → wire`. The abstraction-layer overhead IfxPy carries on every call costs more than the C-vs-Python codec gap saves.
+- **Bulk-insert workloads: `informix-db` wins 1.6× at scale.** The pipelined `executemany` (Phase 33) sends all N BIND+EXECUTE PDUs before draining responses, eliminating per-row RTT. IfxPy still pays one round-trip per `IfxPy.execute(stmt, tuple)` call.
 - **Large-fetch workloads: IfxPy wins 2.3× at scale.** Their C-level `fetch_tuple` decoder is genuinely faster than our Python `parse_tuple_payload` (~1.1 µs/row vs ~2.7 µs/row). At 100k rows, that 1.6 µs/row gap accumulates into a 160 ms wall-clock difference.
 - **Small queries: comparable.** Both spend ~120 µs waiting for the server; the per-call codec cost is small relative to the round-trip.
-**Why we win bulk inserts dramatically:** `executemany` pipelines all N BIND+EXECUTE PDUs to the wire before draining responses (Phase 33), eliminating the per-row round-trip that the older serial loop incurred. IfxPy still does one synchronous round-trip per row.
+**When to prefer `informix-db`:**
 - ETL pipelines, log shipping, bulk writes (1.6× faster at scale)
 - Containerized / minimal-dependency environments (50 KB wheel vs IfxPy's 92 MB OneDB tarball + libcrypt.so.1 dependency hell)
 - Modern Python (works on 3.10–3.14; IfxPy is broken on Python 3.12+)
 - Async / FastAPI workloads (we have native async; IfxPy doesn't)
 **When IfxPy may be faster:**
 - Analytical reporting queries pulling 10k+ rows in a single SELECT
 - Workloads where the per-row decode cost dominates (wide rows, tight read loops)
 These results are reproducible from `tests/benchmarks/compare/` — the Dockerfile, bench script, and README walk through every step.
 Full methodology, IQR caveats, install gauntlet, and reproduction in [`tests/benchmarks/compare/README.md`](tests/benchmarks/compare/README.md).
--- a/pyproject.toml
+++ b/pyproject.toml
@ -1,6 +1,6 @@
 [project]
 name = "informix-db"
-version = "2026.05.05.8"
+version = "2026.05.05.9"
 description = "Pure-Python driver for IBM Informix IDS — speaks the SQLI wire protocol over raw sockets. No CSDK, no JVM, no native libraries."
 readme = "README.md"
 license = { text = "MIT" }
--- a/src/informix_db/cursors.py
+++ b/src/informix_db/cursors.py
@ -401,7 +401,7 @@ class Cursor:
        # Phase 35: NFETCH loop — keep fetching until a response yields
        # zero new tuples. The previous "two NFETCHes" pattern silently
        # truncated any result set whose tuples didn't fit in 1-2 server
-        # batches (~200 rows at default 4096-byte buffer × 5-col rows).
+        # batches (~200 rows at default 4096-byte buffer x 5-col rows).
        # This bug was latent for ~30 phases because no test used a
        # large enough result set to trigger it.
        self._conn._send_pdu(self._build_curname_nfetch_pdu(cursor_name))
--- a/tests/benchmarks/compare/README.md
+++ b/tests/benchmarks/compare/README.md
@ -4,19 +4,29 @@ Head-to-head benchmarks against [IfxPy](https://pypi.org/project/IfxPy/), the IB
 ## TL;DR
-Using **median + IQR over 10+ rounds** (mean was unreliable on the slow benchmarks — see "Statistical robustness" below):
+Using **median + IQR over 10+ rounds** (mean was unreliable on the slow benchmarks — see "Statistical robustness" below). Phase 36 added scaling benchmarks at 1k / 10k / 100k rows so the comparison shape is clearer:
-| Benchmark | IfxPy 3.0.5 (C-bound) | informix-db (pure Python) | Result |
+| Benchmark | IfxPy 3.0.5 | informix-db | Result |
 |---|---:|---:|---:|
-| `select_one_row` (single-row latency) | 118 µs | **114 µs** | **`informix-db` 3% faster** |
+| `select_one_row` | 118 µs | 114 µs | comparable |
-| `select_systables_first_10` (~10 rows) | 164 µs | **159 µs** | **`informix-db` 3% faster** |
+| `select_systables_first_10` | 130 µs | 159 µs | IfxPy 22% faster |
-| `select_bench_table_all` (1000-row fetch) | 984 µs | **891 µs** | **`informix-db` 9% faster** |
+| `cold_connect_disconnect` | 11.0 ms | 10.5 ms | comparable |
-| **`executemany(1000)` in transaction (bulk write)** | 21.4 ms | **10.4 ms** | **`informix-db` 2.05× faster** |
+| **`executemany(1k)` in txn** | 23.5 ms | 23.2 ms | tied |
-| `cold_connect_disconnect` (login handshake) | 11.0 ms | **10.4 ms** | **`informix-db` 5% faster** |
+| **`executemany(10k)` in txn** | 259 ms | **161 ms** | **`informix-db` 1.6× faster** |
 | **`executemany(100k)` in txn** | 2376 ms | **1487 ms** | **`informix-db` 1.6× faster** |
 | `SELECT 1k rows` | 1.2 ms | 2.7 ms | IfxPy 2.3× faster |
 | `SELECT 10k rows` | 11.3 ms | 25.8 ms | IfxPy 2.3× faster |
 | `SELECT 100k rows` | 112 ms | 271 ms | IfxPy 2.4× faster |
-**`informix-db` wins all 5 benchmarks against the C-bound driver, including a 2× win on bulk inserts.**
+**Two clear stories:**
-The bulk-insert win comes from Phase 33's pipelined `executemany`: all N BIND+EXECUTE PDUs are sent to the wire before any response is drained, eliminating the per-row round-trip latency that the older serial loop (and IfxPy's per-call API) incur. The wire-alignment assumption that makes this safe — that Informix sends exactly N responses for N pipelined PDUs even when one row fails — is verified by `tests/test_executemany_pipeline.py` (constraint violation at row 0/100, 99/100, 500/1000).
+**1. Bulk insert: `informix-db` wins 1.6× at scale.** The pipelined `executemany` (Phase 33) sends all N BIND+EXECUTE PDUs to the wire before draining responses, eliminating per-row RTT. IfxPy still pays one synchronous round-trip per `IfxPy.execute(stmt, tuple)` call — that's ~24 µs/row regardless of N. We pay ~15 µs/row at scale (the prepare/release overhead amortizes better at larger N).
 **2. Large fetch: IfxPy wins 2.3-2.4× at scale.** Their C-level `fetch_tuple` decoder runs at ~1.1 µs/row; our pure-Python `parse_tuple_payload` runs at ~2.7 µs/row. At 100k rows, the 1.6 µs/row gap accumulates into a 160 ms wall-clock difference. **This is the C-vs-Python codec cost showing up at scale, where it actually matters.**
 For everyday-application workloads (single SELECT in a request, INSERT a handful of rows, transactional UPDATE), the two drivers are within 5-25% of each other. For the workloads where the gap widens, the direction depends on what you're doing — bulk-write favors us, bulk-read favors IfxPy.
 **The wire-alignment assumption** that makes pipelined `executemany` safe — that Informix sends exactly N responses for N pipelined PDUs even when one row fails — is verified by `tests/test_executemany_pipeline.py` (constraint violation at row 0/100, 99/100, 500/1000).
 ## Statistical robustness — why median, not mean
--- a/tests/benchmarks/compare/ifxpy_bench.py
+++ b/tests/benchmarks/compare/ifxpy_bench.py
@ -171,6 +171,107 @@ def bench_cold_connect_disconnect() -> dict:
    return measure("cold_connect_disconnect", ROUNDS_SLOW, run)
 # ----------------------------------------------------------------------------
 # Phase 36 — scaling benchmarks (matched to test_scaling_perf.py)
 # ----------------------------------------------------------------------------
 def bench_executemany_scaling(n_rows: int) -> dict:
    """N-row insert in a single transaction. IfxPy doesn't pipeline —
    each ``IfxPy.execute(stmt, params)`` is a synchronous round-trip
    to the server. So per-row cost is roughly constant in N."""
    rounds_for = {1_000: 10, 10_000: 5, 100_000: 3}
    name = f"executemany_scaling_{n_rows}"
    try:
        conn = IfxPy.connect(
            CONN_STR.replace("DATABASE=sysmaster", "DATABASE=testdb"), "", ""
        )
    except Exception as e:
        return {"name": name, "skipped": f"testdb: {e}"}
    IfxPy.autocommit(conn, IfxPy.SQL_AUTOCOMMIT_OFF)
    table = f"p36_em_{n_rows}"
    try:
        try:
            IfxPy.exec_immediate(conn, f"DROP TABLE {table}")
            IfxPy.commit(conn)
        except Exception:
            pass
        IfxPy.exec_immediate(
            conn, f"CREATE TABLE {table} (id INT, name VARCHAR(64), value FLOAT)"
        )
        IfxPy.commit(conn)
        counter = [0]
        def run() -> None:
            counter[0] += 1
            base = counter[0] * n_rows
            stmt = IfxPy.prepare(
                conn, f"INSERT INTO {table} VALUES (?, ?, ?)"
            )
            for i in range(n_rows):
                IfxPy.execute(stmt, (base + i, f"row_{base + i}", float(base + i)))
            IfxPy.free_stmt(stmt)
            IfxPy.commit(conn)
        return measure(name, rounds_for[n_rows], run)
    finally:
        try:
            IfxPy.exec_immediate(conn, f"DROP TABLE {table}")
            IfxPy.commit(conn)
        except Exception:
            pass
        IfxPy.close(conn)
 def bench_select_scaling(n_rows: int) -> dict:
    """SELECT FIRST N from the pre-populated 100k-row p34_select table.
    Tests IfxPy's per-row fetch cost at scale; should be roughly linear
    in N like ours."""
    rounds_for = {1_000: 10, 10_000: 5, 100_000: 3}
    name = f"select_scaling_{n_rows}"
    try:
        conn = IfxPy.connect(
            CONN_STR.replace("DATABASE=sysmaster", "DATABASE=testdb"), "", ""
        )
    except Exception as e:
        return {"name": name, "skipped": f"testdb: {e}"}
    try:
        # Probe: does p34_select exist?
        try:
            stmt = IfxPy.exec_immediate(conn, "SELECT COUNT(*) FROM p34_select")
            row = IfxPy.fetch_tuple(stmt)
            IfxPy.free_stmt(stmt)
            available = int(row[0])
            if available < n_rows:
                return {"name": name, "skipped": (
                    f"p34_select has only {available} rows; "
                    "run informix-db scaling benchmarks first to seed "
                    "the table"
                )}
        except Exception as e:
            return {"name": name, "skipped": f"p34_select missing: {e}"}
        def run() -> None:
            stmt = IfxPy.exec_immediate(
                conn, f"SELECT FIRST {n_rows} * FROM p34_select"
            )
            count = 0
            while IfxPy.fetch_tuple(stmt):
                count += 1
            IfxPy.free_stmt(stmt)
            if count != n_rows:
                raise RuntimeError(
                    f"expected {n_rows} rows, got {count}"
                )
        return measure(name, rounds_for[n_rows], run)
    finally:
        IfxPy.close(conn)
 def main() -> None:
    print("# IfxPy benchmark results", file=sys.stderr)
    print(f"# IfxPy version: {IfxPy.__version__ if hasattr(IfxPy, '__version__') else 'unknown'}", file=sys.stderr)
@ -187,6 +288,15 @@ def main() -> None:
    results.append(bench_executemany_1000_rows_in_txn())
    results.append(bench_cold_connect_disconnect())
    # Phase 36 — scaling comparison. Skip 100k cases when --short is
    # passed (e.g., for fast smoke runs); otherwise run all sizes.
    short = "--short" in sys.argv
    sizes = [1_000, 10_000] if short else [1_000, 10_000, 100_000]
    for n in sizes:
        results.append(bench_executemany_scaling(n))
    for n in sizes:
        results.append(bench_select_scaling(n))
    # Emit machine-parseable lines on stdout. Reporting median (not
    # mean) and IQR (not stddev) so a single outlier round can't
    # dominate the comparison numbers — mirrors pytest-benchmark's
--- a/uv.lock
+++ b/uv.lock
@ -34,7 +34,7 @@ wheels = [
 [[package]]
 name = "informix-db"
-version = "2026.5.5.6"
+version = "2026.5.5.8"
 source = { editable = "." }
 [package.optional-dependencies]