Phase 33: Pipelined executemany - 2.85x faster bulk insert (2026.05.05.6)

The serial-loop executemany paid one wire round-trip per row (~30us/ row on loopback). It was the one benchmark where IfxPy beat us in the comparison work - 10% slower at executemany(1000) in txn. Phase 33 pipelines the BIND+EXECUTE PDUs: build all N PDUs, send them back-to-back, then drain all N responses. Eliminates per-row RTT entirely. Performance impact: * executemany(1000) in txn: 31.3 ms -> 11.0 ms (2.85x faster) * executemany(100) autocommit: 173 ms -> 154 ms (11% faster) * executemany(1000) autocommit: 1740 ms -> 1590 ms (9% faster) (Autocommit gets smaller wins because server-side log flushes dominate - Phase 21.1's "autocommit cliff".) IfxPy comparison flipped: us 10% slower -> us 2.05x faster on bulk inserts. We now win all 5 head-to-head benchmarks against the C-bound driver. Margaret Hamilton review surfaced one CRITICAL concern (C1) - the pipeline assumes Informix sends N responses for N pipelined PDUs even when one fails. If the server cut the stream short, the drain loop would deadlock on the next read. Verified by 3 new integration tests in tests/test_executemany_pipeline.py: * test_pipelined_executemany_mid_batch_constraint_violation (row 500/1000) * test_pipelined_executemany_first_row_fails (row 0/100) * test_pipelined_executemany_last_row_fails (row 99/100) All confirm Informix sends N responses; wire stays aligned; connection is usable after. Plus 4 lower-priority fixes Hamilton recommended: * H1: documented _raise_sq_err self-drains-SQ_EOT invariant + tripwire * H2: docstring warning about O(N) lock duration; chunk for huge batches * M1: prepend row-index to exception message rather than reformat * M2: documented sendall-no-timeout caveat on hostile networks 77 unit + 239 integration + 33 benchmark = 349 tests; ruff clean. Note: Phase 32 (Tier 1+2 benchmarks) was tagged without bumping pyproject.toml's version string. .5 was git-tag-only; .6 is the next published version increment.
2026-05-05 12:26:15 -06:00 · 2026-05-05 12:26:15 -06:00 · 362ecb3d63
commit 362ecb3d63
parent 01757415a5
6 changed files with 363 additions and 22 deletions
--- a/CHANGELOG.md
+++ b/CHANGELOG.md
@ -2,6 +2,59 @@
 All notable changes to `informix-db`. Versioning is [CalVer](https://calver.org/) — `YYYY.MM.DD` for date-based releases, `YYYY.MM.DD.N` for same-day post-releases per PEP 440.
 ## 2026.05.05.6 — Pipelined `executemany` (Phase 33) — 2.85× faster on bulk inserts
 The previous serial-loop `executemany` paid one wire round-trip per row (~30 µs/row on loopback × N rows = the dominant cost for any sizeable batch). It was the *one* benchmark where IfxPy beat us in the comparison work — 10% slower at `executemany(1000)` in transaction.
 Phase 33 pipelines the BIND+EXECUTE PDUs: build all N PDUs first, send them back-to-back, then drain all N responses. Eliminates the per-row RTT entirely.
 ### Performance impact
 | Benchmark | Before | After | Speedup |
 |---|---:|---:|---:|
 | `executemany(1000)` in transaction | 31.3 ms | **11.0 ms** | **2.85× faster** |
 | `executemany(100)` in autocommit | 173 ms | 154 ms | 11% faster |
 | `executemany(1000)` in autocommit | 1740 ms | 1590 ms | 9% faster |
 Autocommit cases get smaller relative wins because server-side log flushes per row dominate the absolute cost (Phase 21.1's "autocommit cliff").
 ### IfxPy comparison: now winning all 5 benchmarks
 The comparison flipped from "us 10% slower on bulk inserts" to "us 2.05× faster":
 | Benchmark | IfxPy | informix-db | Result |
 |---|---:|---:|---:|
 | `select_one_row` | 118 µs | 114 µs | us 3% faster |
 | `select_systables_first_10` | 164 µs | 159 µs | us 3% faster |
 | `select_bench_table_all` (1k rows) | 984 µs | 891 µs | us 9% faster |
 | **`executemany(1000)` in txn** | **21.4 ms** | **10.4 ms** | **us 2.05× faster** |
 | `cold_connect_disconnect` | 11.0 ms | 10.4 ms | us 5% faster |
 ### Margaret Hamilton review pass
 Hamilton flagged one critical concern (C1) before approving: the pipeline assumes Informix sends *exactly* N responses for N pipelined PDUs even when one row fails. If the server cut the response stream short on first error, the drain loop would block on the next read and the connection would deadlock.
 **Verified by integration test** (`tests/test_executemany_pipeline.py`):
 - Constraint violation at row 0/100 (first-row failure)
 - Constraint violation at row 99/100 (last-row failure)
 - Constraint violation at row 500/1000 (mid-batch failure)
 All 3 confirm: Informix DOES send N responses for N PDUs; wire stays aligned; connection is usable after.
 Plus four lower-priority fixes Hamilton recommended:
 - **H1**: documented the `_raise_sq_err` self-drains-SQ_EOT invariant in the drain loop, plus the tripwire test that catches its violation.
 - **H2**: docstring warning that lock-holding time scales O(N) in batch size; recommend chunking for very large batches.
 - **M1**: prepend row-index annotation rather than reformat the exception message — preserves `[<sqlcode>] <text>` prefix for string-scraping callers.
 - **M2**: documented that `sendall` doesn't honor a write timeout reliably on all kernels; recommend `keepalive=True` for hostile networks.
 ### Tests
 3 new integration tests in `tests/test_executemany_pipeline.py` validate the wire-alignment invariant. Total: **77 unit + 239 integration + 33 benchmark = 349 tests**.
 ### Note on version 2026.05.05.5
 The Phase 32 (Tier 1+2 benchmarks) tag was applied without bumping `pyproject.toml`'s version string — that release is git-tag-only. Version 2026.05.05.6 (Phase 33) is the next published version increment.
 ## 2026.05.05.4 — Final hardening pass (Phase 30)
 Closes the last 3 medium-severity items from Hamilton's system-wide audit. **No findings remain.**
--- a/README.md
+++ b/README.md
@ -176,15 +176,17 @@ Head-to-head benchmarks against [IfxPy](https://pypi.org/project/IfxPy/) on iden
 | Benchmark | IfxPy 3.0.5 (C-bound) | `informix-db` (pure Python) | Result |
 |---|---:|---:|---:|
-| Single-row SELECT round-trip | 170 µs | **119 µs** | **`informix-db` 30% faster** |
+| Single-row SELECT round-trip | 118 µs | **114 µs** | **`informix-db` 3% faster** |
-| ~10-row server-side query | 186 µs | **142 µs** | **`informix-db` 24% faster** |
+| ~10-row server-side query | 164 µs | **159 µs** | **`informix-db` 3% faster** |
-| 1000-row SELECT (full fetch) | 980 µs | **832 µs** | **`informix-db` 15% faster** |
+| 1000-row SELECT (full fetch) | 984 µs | **891 µs** | **`informix-db` 9% faster** |
-| `executemany(1000)` in transaction | 28.3 ms | 31.3 ms | 10% slower |
+| **`executemany(1000)` in transaction** | 21.4 ms | **10.4 ms** | **`informix-db` 2.05× faster** |
-| Cold connect (login handshake) | 12.0 ms | **10.7 ms** | **`informix-db` 11% faster** |
+| Cold connect (login handshake) | 11.0 ms | **10.4 ms** | **`informix-db` 5% faster** |
-**`informix-db` is faster on 4 of 5 benchmarks against the C-bound driver.** The one loss is bulk-write workloads, where IfxPy's C-level per-row marshaling beats our Python BIND-PDU build by single-digit percent (within IfxPy's own measurement noise — its IQR on that benchmark is 29% of its own median).
+**`informix-db` wins on all 5 benchmarks against the C-bound driver, including a 2× win on bulk inserts.**
-**Why pure-Python wins the round-trip-bound work:** IfxPy's actual code path is `Python → OneDB ODBC driver → libifdmr.so → wire`. Ours is `Python → wire`. The abstraction-layer overhead IfxPy carries on every call costs more than the C-vs-Python codec gap saves. We hit the wire directly with one less hop.
+**Why pure-Python wins the round-trip-bound work:** IfxPy's code path is `Python → OneDB ODBC driver → libifdmr.so → wire`. Ours is `Python → wire`. The abstraction-layer overhead IfxPy carries on every call costs more than the C-vs-Python codec gap saves.
 **Why we win bulk inserts dramatically:** `executemany` pipelines all N BIND+EXECUTE PDUs to the wire before draining responses (Phase 33), eliminating the per-row round-trip that the older serial loop incurred. IfxPy still does one synchronous round-trip per row.
 Full methodology, IQR caveats, install gauntlet, and reproduction in [`tests/benchmarks/compare/README.md`](tests/benchmarks/compare/README.md).
--- a/pyproject.toml
+++ b/pyproject.toml
@ -1,6 +1,6 @@
 [project]
 name = "informix-db"
-version = "2026.05.05.4"
+version = "2026.05.05.6"
 description = "Pure-Python driver for IBM Informix IDS — speaks the SQLI wire protocol over raw sockets. No CSDK, no JVM, no native libraries."
 readme = "README.md"
 license = { text = "MIT" }
--- a/src/informix_db/cursors.py
+++ b/src/informix_db/cursors.py
@ -828,12 +828,38 @@ class Cursor:
        """Execute the same SQL once per parameter set.
        Per PEP 249. Common case is batched INSERT. We PREPARE once,
-        loop SQ_BIND+SQ_EXECUTE per parameter set, then RELEASE once —
+        send N SQ_BIND+SQ_EXECUTE PDUs in a pipelined batch, then drain
-        much cheaper than calling ``execute()`` N times (which would
+        N responses, then RELEASE once. Phase 33 introduces the pipeline
-        PREPARE+RELEASE on each iteration).
+        — earlier serial-loop implementations paid one wire round-trip
        per row (~30 us/row on loopback x N rows = the dominant cost
        for any sizeable batch).
        Phase 4 supports DML (INSERT/UPDATE/DELETE) only — SELECT in
        executemany doesn't make much sense and isn't implemented.
        **Pipelining safety** (Phase 33):
        * The Phase 27 wire lock holds for the whole executemany, so
          the entire send-batch + drain-batch is atomic against other
          threads on the connection.
        * TCP send buffer (~16-256 KB) easily fits 1000 PDUs (~80-200
          KB worst case); response packets are tiny (~10 bytes per
          OK), so the server's send buffer can't fill before we drain.
          Note: ``sendall`` doesn't honor a write timeout reliably on
          all kernels — a wedged peer could block until TCP keepalive
          fires (default ~2 hours). For hostile-network deployments,
          set ``keepalive=True`` on connect.
        * On the first error mid-drain, remaining responses are
          drained silently (they're SQ_ERR replies for rows that the
          aborted transaction couldn't commit anyway). Wire alignment
          is verified by ``test_executemany_pipeline.py`` — Informix
          does send N responses for N pipelined PDUs even when one
          fails. If a future Informix version changes that behavior,
          those tests fail loudly.
        * **Lock duration scales O(N) with batch size.** For very
          large batches (>10000 rows), other threads waiting on this
          connection will block proportionally. Prefer chunking into
          multiple ``executemany`` calls of 1000-10000 rows so other
          threads aren't starved.
        """
        self._check_open()
@ -880,18 +906,62 @@ class Cursor:
            )
            self._read_describe_response()
-            # BIND+EXECUTE per parameter set.
+            # Phase 33: pipeline — build all BIND+EXECUTE PDUs first
            # (Python work, no I/O), then send them back-to-back, then
            # drain all responses. Eliminates the per-row round-trip
            # the older serial loop paid.
            pdus = [
                self._build_bind_execute_pdu(tuple(p)) for p in seq
            ]
            for pdu in pdus:
                self._conn._send_pdu(pdu)
            # Drain N responses. The first error is captured but we
            # still drain the rest (they're SQ_ERRs for the aborted
            # transaction's queued rows) so the wire stays consistent.
            #
            # Wire-framing invariant: each response — whether SQ_DONE
            # for a successful row or SQ_ERR for a failed one — ends
            # with its own SQ_EOT. ``_raise_sq_err`` self-drains the
            # SQ_ERR's trailing SQ_EOT (see connections.py:_raise_sq_err
            # drain loop). So calling ``_drain_to_eot`` exactly N times
            # consumes exactly the responses for N PDUs, regardless of
            # how many succeeded vs. failed. If ``_raise_sq_err`` is
            # ever refactored to leave its trailing EOT for the caller,
            # this loop silently desyncs — the test
            # ``test_executemany_pipeline.py`` is the tripwire.
            total_rowcount = 0
-            for params in seq:
+            first_error: Exception | None = None
            first_error_row: int | None = None
            for i in range(len(pdus)):
                self._rowcount = -1
-                self._conn._send_pdu(self._build_bind_execute_pdu(tuple(params)))
+                try:
-                self._drain_to_eot()
+                    self._drain_to_eot()
                except Exception as exc:
                    if first_error is None:
                        first_error = exc
                        first_error_row = i
                    continue
                if self._rowcount > 0:
                    total_rowcount += self._rowcount
            # RELEASE once.
            self._conn._send_pdu(self._build_release_pdu())
            self._drain_to_eot()
            if first_error is not None:
                # Annotate which row in the batch first failed by
                # PREPENDING to the existing message — preserves the
                # ``[<sqlcode>] <text>`` prefix that string-scraping
                # callers may rely on, and keeps the exception class
                # + structured fields (.sqlcode, .isamcode, .near).
                if first_error.args:
                    first_error.args = (
                        f"executemany row {first_error_row}/{len(pdus)}: "
                        f"{first_error.args[0]}",
                        *first_error.args[1:],
                    )
                raise first_error
            self._rowcount = total_rowcount
    def fetchone(self) -> tuple | None:
--- a/tests/benchmarks/compare/README.md
+++ b/tests/benchmarks/compare/README.md
@ -6,15 +6,17 @@ Head-to-head benchmarks against [IfxPy](https://pypi.org/project/IfxPy/), the IB
 Using **median + IQR over 10+ rounds** (mean was unreliable on the slow benchmarks — see "Statistical robustness" below):
-| Benchmark | IfxPy 3.0.5 (C-bound) | informix-db 2026.05.05.4 (pure Python) | Result |
+| Benchmark | IfxPy 3.0.5 (C-bound) | informix-db (pure Python) | Result |
 |---|---:|---:|---:|
-| `select_one_row` (single-row latency) | 170 µs | **119 µs** | **`informix-db` 30% faster** |
+| `select_one_row` (single-row latency) | 118 µs | **114 µs** | **`informix-db` 3% faster** |
-| `select_systables_first_10` (~10 rows) | 186 µs | **142 µs** | **`informix-db` 24% faster** |
+| `select_systables_first_10` (~10 rows) | 164 µs | **159 µs** | **`informix-db` 3% faster** |
-| `select_bench_table_all` (1000-row fetch) | 980 µs | **832 µs** | **`informix-db` 15% faster** |
+| `select_bench_table_all` (1000-row fetch) | 984 µs | **891 µs** | **`informix-db` 9% faster** |
-| `executemany(1000)` in transaction (bulk write) | 28.3 ms (IQR 29%) | 31.3 ms (IQR 10%) | 10% slower (within IfxPy's noise) |
+| **`executemany(1000)` in transaction (bulk write)** | 21.4 ms | **10.4 ms** | **`informix-db` 2.05× faster** |
-| `cold_connect_disconnect` (login handshake) | 12.0 ms | **10.7 ms** | **`informix-db` 11% faster** |
+| `cold_connect_disconnect` (login handshake) | 11.0 ms | **10.4 ms** | **`informix-db` 5% faster** |
-**`informix-db` is faster on 4 of 5 benchmarks against the C-bound driver.** The one loss is bulk-write workloads, where the gap is within IfxPy's own measurement noise (its IQR on that benchmark is 29% of its own median).
+**`informix-db` wins all 5 benchmarks against the C-bound driver, including a 2× win on bulk inserts.**
 The bulk-insert win comes from Phase 33's pipelined `executemany`: all N BIND+EXECUTE PDUs are sent to the wire before any response is drained, eliminating the per-row round-trip latency that the older serial loop (and IfxPy's per-call API) incur. The wire-alignment assumption that makes this safe — that Informix sends exactly N responses for N pipelined PDUs even when one row fails — is verified by `tests/test_executemany_pipeline.py` (constraint violation at row 0/100, 99/100, 500/1000).
 ## Statistical robustness — why median, not mean
--- a/tests/test_executemany_pipeline.py
+++ b/tests/test_executemany_pipeline.py
@ -0,0 +1,214 @@
 """Phase 33 integration tests — pipelined ``executemany`` correctness.
 The pipelined executemany sends all N BIND+EXECUTE PDUs to the wire
 before draining any response. Hamilton's review of Phase 33 flagged
 C1: this assumes the server sends *exactly* N responses for N
 pipelined PDUs even when one row fails. If the server cuts the
 response stream short on first error, the drain loop would block
 reading bytes that never arrive — the connection would deadlock on
 the next read.
 These tests verify the wire-alignment assumption holds:
 1. Constraint violation at row 500 of 1000 — happy-failure case.
 2. Wire-alignment recovery — connection is still usable after the
   error (proving the RELEASE drain succeeded and we read all the
   remaining error responses).
 3. Subsequent operations on the same connection work — proves no
   stray bytes on the wire.
 """
 from __future__ import annotations
 import contextlib
 from collections.abc import Iterator
 import pytest
 import informix_db
 from tests.conftest import ConnParams
 pytestmark = pytest.mark.integration
@pytest.fixture
 def constraint_table(logged_db_params: ConnParams) -> Iterator[str]:
    """Table with a UNIQUE constraint on ``id`` so we can force a
    constraint violation at a known row.
    """
    table = "p33_constraint"
    conn = informix_db.connect(
        host=logged_db_params.host,
        port=logged_db_params.port,
        user=logged_db_params.user,
        password=logged_db_params.password,
        database=logged_db_params.database,
        server=logged_db_params.server,
        autocommit=True,
    )
    cur = conn.cursor()
    with contextlib.suppress(Exception):
        cur.execute(f"DROP TABLE {table}")
    cur.execute(
        f"CREATE TABLE {table} (id INT NOT NULL PRIMARY KEY, name VARCHAR(64))"
    )
    conn.close()
    try:
        yield table
    finally:
        conn = informix_db.connect(
            host=logged_db_params.host,
            port=logged_db_params.port,
            user=logged_db_params.user,
            password=logged_db_params.password,
            database=logged_db_params.database,
            server=logged_db_params.server,
            autocommit=True,
        )
        cur = conn.cursor()
        with contextlib.suppress(Exception):
            cur.execute(f"DROP TABLE {table}")
        conn.close()
 def test_pipelined_executemany_mid_batch_constraint_violation(
    logged_db_params: ConnParams, constraint_table: str
 ) -> None:
    """C1 (Hamilton): force a constraint violation at row 500 of 1000;
    verify the pipeline drains cleanly and the connection is usable
    afterward.
    This is the test that validates Phase 33's wire-alignment
    assumption. If Informix sends fewer than 1000 responses for 1000
    pipelined PDUs after the row-500 failure, this test will hang on
    the drain loop's read (eventually timing out, but the test will
    fail loudly either way).
    """
    conn = informix_db.connect(
        host=logged_db_params.host,
        port=logged_db_params.port,
        user=logged_db_params.user,
        password=logged_db_params.password,
        database=logged_db_params.database,
        server=logged_db_params.server,
        autocommit=False,
        read_timeout=30.0,  # if the wire desyncs, fail loudly within 30s
    )
    try:
        # Pre-seed row 500 so the executemany's row-500 INSERT will
        # violate the UNIQUE constraint.
        cur = conn.cursor()
        cur.execute(
            f"INSERT INTO {constraint_table} VALUES (?, ?)",
            (500, "pre-existing"),
        )
        conn.commit()
        # Now executemany 1000 rows; row 500 will collide
        rows = [(i, f"row_{i}") for i in range(1000)]
        with pytest.raises(informix_db.IntegrityError) as exc_info:
            cur.executemany(
                f"INSERT INTO {constraint_table} VALUES (?, ?)", rows
            )
        # The error message should identify which row failed in the batch
        err_msg = str(exc_info.value)
        assert "row 500" in err_msg or "500" in err_msg, (
            f"error message should identify the failed row index: {err_msg}"
        )
        # Whatever the transaction state, rolling back is the correct
        # response to a failed batch.
        conn.rollback()
        # The connection MUST be usable after the failed batch.
        # If the wire is desynced, this query will block or fail
        # with a ProtocolError. The test passing here proves the
        # pipeline drained cleanly.
        cur = conn.cursor()
        cur.execute(f"SELECT COUNT(*) FROM {constraint_table}")
        (count,) = cur.fetchone()
        # After rollback, only the pre-seeded row 500 remains
        assert count == 1, (
            f"expected only the pre-seeded row to remain, got {count} "
            "(transaction didn't roll back cleanly?)"
        )
    finally:
        conn.close()
 def test_pipelined_executemany_first_row_fails(
    logged_db_params: ConnParams, constraint_table: str
 ) -> None:
    """Edge case: failure on the FIRST row of the pipeline. Tests that
    the drain loop correctly handles "every response after this is an
    error" without falling apart on the very first response."""
    conn = informix_db.connect(
        host=logged_db_params.host,
        port=logged_db_params.port,
        user=logged_db_params.user,
        password=logged_db_params.password,
        database=logged_db_params.database,
        server=logged_db_params.server,
        autocommit=False,
        read_timeout=30.0,
    )
    try:
        cur = conn.cursor()
        cur.execute(
            f"INSERT INTO {constraint_table} VALUES (?, ?)", (0, "seeded")
        )
        conn.commit()
        rows = [(i, f"row_{i}") for i in range(100)]
        with pytest.raises(informix_db.IntegrityError):
            cur.executemany(
                f"INSERT INTO {constraint_table} VALUES (?, ?)", rows
            )
        conn.rollback()
        cur = conn.cursor()
        cur.execute(f"SELECT COUNT(*) FROM {constraint_table}")
        (count,) = cur.fetchone()
        assert count == 1
    finally:
        conn.close()
 def test_pipelined_executemany_last_row_fails(
    logged_db_params: ConnParams, constraint_table: str
 ) -> None:
    """Edge case: failure on the LAST row of the pipeline. Tests that
    we don't accidentally short-circuit the drain when we see the
    "expected" rowcount before the actual error response arrives."""
    conn = informix_db.connect(
        host=logged_db_params.host,
        port=logged_db_params.port,
        user=logged_db_params.user,
        password=logged_db_params.password,
        database=logged_db_params.database,
        server=logged_db_params.server,
        autocommit=False,
        read_timeout=30.0,
    )
    try:
        cur = conn.cursor()
        cur.execute(
            f"INSERT INTO {constraint_table} VALUES (?, ?)",
            (99, "seeded-last"),
        )
        conn.commit()
        rows = [(i, f"row_{i}") for i in range(100)]
        with pytest.raises(informix_db.IntegrityError):
            cur.executemany(
                f"INSERT INTO {constraint_table} VALUES (?, ?)", rows
            )
        conn.rollback()
        cur = conn.cursor()
        cur.execute(f"SELECT COUNT(*) FROM {constraint_table}")
        (count,) = cur.fetchone()
        assert count == 1
    finally:
        conn.close()