Phase 36: IfxPy scaling comparison + honest comparison numbers (2026.05.05.9)

Extends the IfxPy comparison bench script with scaling workloads
(1k/10k/100k rows for both executemany and SELECT). Re-runs the
full comparison with consistent measurement methodology and updates
the README with the actually-correct numbers.

Earlier comparison runs reported informix-db winning all 5
benchmarks. Re-running select_bench_table_all with consistent
measurement gives 3.04 ms, not the 891 us I cited earlier - a
3.4x discrepancy attributable to noisy warmup + small-fixture
artifacts. The "we win everything" framing was wrong.

Corrected comparison reveals two clear stories:

Bulk-insert: pure-Python wins 1.6x at scale.
  executemany(10k):  IfxPy 259ms  -> us 161ms (1.6x faster)
  executemany(100k): IfxPy 2376ms -> us 1487ms (1.6x faster)
Reason: Phase 33's pipelining eliminates per-row RTT. IfxPy's
per-call API can't pipeline.

Large-fetch: IfxPy wins 2.3-2.4x at scale.
  SELECT 1k rows:   IfxPy 1.2ms  / us 2.7ms (IfxPy 2.3x)
  SELECT 10k rows:  IfxPy 11.3ms / us 25.8ms (IfxPy 2.3x)
  SELECT 100k rows: IfxPy 112ms  / us 271ms (IfxPy 2.4x)
Reason: C-level fetch_tuple at ~1.1us/row beats Python
parse_tuple_payload at ~2.7us/row. Real C-vs-Python codec gap
showing up at scale.

For everyday workloads (single SELECT in a request, INSERT a
handful of rows), drivers are within 5-25%. For workloads where
the gap widens, direction depends on what you're doing - bulk-
write favors us, bulk-read favors IfxPy.

README's "Compared to IfxPy" section rewritten with the corrected
numbers and an honest "when to prefer which" subsection.
tests/benchmarks/compare/README.md mirror updated.

Net narrative: a "faster at bulk-write, slower at bulk-read,
comparable elsewhere" comparison story is more honest and more
durable than a "we win everything" claim that would have collapsed
the first time a user ran their own benchmark.

Side note (lint): one ambiguous unicode `×` in cursors.py replaced
with `x`.

Phase 37 ticket: parse_tuple_payload is the bottleneck at scale.
Closing the 1.6 us/row gap to IfxPy would make us competitive on
bulk-fetch too. Possible approaches: Cython codec, deeper inlining,
per-column dispatch pre-bake.
This commit is contained in:
Ryan Malloy 2026-05-05 12:44:52 -06:00
parent 8eb19f7534
commit 270155d2de
7 changed files with 199 additions and 20 deletions

View File

@ -2,6 +2,49 @@
All notable changes to `informix-db`. Versioning is [CalVer](https://calver.org/) — `YYYY.MM.DD` for date-based releases, `YYYY.MM.DD.N` for same-day post-releases per PEP 440. All notable changes to `informix-db`. Versioning is [CalVer](https://calver.org/) — `YYYY.MM.DD` for date-based releases, `YYYY.MM.DD.N` for same-day post-releases per PEP 440.
## 2026.05.05.9 — IfxPy scaling comparison + honest comparison numbers (Phase 36)
Adds the IfxPy side of Phase 34's scaling benchmarks (1k / 10k / 100k rows for both `executemany` and `SELECT`) and updates the README's comparison table with the **actually-correct numbers**.
### What changed
**1. `tests/benchmarks/compare/ifxpy_bench.py` extended** with `bench_executemany_scaling(n)` and `bench_select_scaling(n)` — same shapes as `test_scaling_perf.py` so the comparison is apples-to-apples.
**2. README's comparison numbers corrected.** Earlier comparison runs reported `select_bench_table_all` at 891 µs for `informix-db`. Re-running with consistent measurement (warmup + median + 10+ rounds) reports 3.04 ms — a 3.4× discrepancy. The earlier number was probably picked up from a noisy first-run with a different warmup state, or from a benchmark that wasn't fully populating its fixture. **Either way, the "we win all 5 benchmarks" claim was based on inconsistent measurement.**
**The corrected comparison reveals two clear stories:**
| Benchmark | IfxPy | informix-db | Result |
|---|---:|---:|---|
| `executemany(1k)` in txn | 23.5 ms | 23.2 ms | tied |
| `executemany(10k)` in txn | 259 ms | **161 ms** | **us 1.6× faster** |
| `executemany(100k)` in txn | 2376 ms | **1487 ms** | **us 1.6× faster** |
| `SELECT 1k rows` | 1.2 ms | 2.7 ms | IfxPy 2.3× faster |
| `SELECT 10k rows` | 11.3 ms | 25.8 ms | IfxPy 2.3× faster |
| `SELECT 100k rows` | 112 ms | 271 ms | IfxPy 2.4× faster |
**Bulk-insert: pure-Python wins 1.6× at scale** because pipelining (Phase 33) eliminates per-row RTT. IfxPy's `IfxPy.execute(stmt, tuple)` per-call API can't pipeline.
**Large-fetch: IfxPy wins 2.3-2.4× at scale.** Their C-level `fetch_tuple` decoder runs at ~1.1 µs/row; our `parse_tuple_payload` runs at ~2.7 µs/row. **This is the real C-vs-Python codec cost showing up at scale where it matters.**
### Why correcting this matters
A "we win everything" claim that's based on noisy measurements would have collapsed the first time a user ran their own benchmark and got different numbers. Naming the trade-off honestly — "we're faster at bulk write, slower at bulk read, comparable elsewhere" — is the right framing.
### When to prefer `informix-db`
- ETL pipelines, log shipping, bulk writes (1.6× faster at scale)
- Containerized / minimal-dependency environments (50 KB wheel vs IfxPy's 92 MB OneDB tarball + libcrypt.so.1 dependency hell)
- Modern Python (works on 3.103.14; IfxPy is broken on Python 3.12+)
- Async / FastAPI workloads (we have native async; IfxPy doesn't)
### When IfxPy may be faster
- Analytical reporting queries pulling 10k+ rows in a single SELECT
- Workloads where the per-row decode cost dominates (wide rows, tight read loops)
The actionable takeaway for `informix-db`'s future: the parse_tuple_payload hot path is now the bottleneck at scale. Phase 25's branch reorder shaved 22%; further work (Cython codec? deeper inlining? per-column dispatch pre-bake?) could close the C-vs-Python gap. Tracked as a possible Phase 37+.
## 2026.05.05.8 — Scaling benchmarks (Phase 34) ## 2026.05.05.8 — Scaling benchmarks (Phase 34)
Adds `tests/benchmarks/test_scaling_perf.py` — parametrized benchmarks that exercise the driver at row counts and column widths well beyond what the existing 1k-row benchmarks cover. The first thing this suite did was catch the NFETCH-loop data-loss bug fixed in Phase 35. Adds `tests/benchmarks/test_scaling_perf.py` — parametrized benchmarks that exercise the driver at row counts and column widths well beyond what the existing 1k-row benchmarks cover. The first thing this suite did was catch the NFETCH-loop data-loss bug fixed in Phase 35.

View File

@ -176,17 +176,33 @@ Head-to-head benchmarks against [IfxPy](https://pypi.org/project/IfxPy/) on iden
| Benchmark | IfxPy 3.0.5 (C-bound) | `informix-db` (pure Python) | Result | | Benchmark | IfxPy 3.0.5 (C-bound) | `informix-db` (pure Python) | Result |
|---|---:|---:|---:| |---|---:|---:|---:|
| Single-row SELECT round-trip | 118 µs | **114 µs** | **`informix-db` 3% faster** | | Single-row SELECT round-trip | 118 µs | 114 µs | comparable |
| ~10-row server-side query | 164 µs | **159 µs** | **`informix-db` 3% faster** | | ~10-row server-side query | 130 µs | 159 µs | IfxPy 22% faster |
| 1000-row SELECT (full fetch) | 984 µs | **891 µs** | **`informix-db` 9% faster** | | Cold connect (login handshake) | 11.0 ms | 10.5 ms | comparable |
| **`executemany(1000)` in transaction** | 21.4 ms | **10.4 ms** | **`informix-db` 2.05× faster** | | **`executemany(1k)` in transaction** | 23.5 ms | 23.2 ms | tied |
| Cold connect (login handshake) | 11.0 ms | **10.4 ms** | **`informix-db` 5% faster** | | **`executemany(10k)` in transaction** | 259 ms | **161 ms** | **`informix-db` 1.6× faster** |
| **`executemany(100k)` in transaction** | 2376 ms | **1487 ms** | **`informix-db` 1.6× faster** |
| `SELECT` 1k rows | 1.2 ms | 2.7 ms | IfxPy 2.3× faster |
| `SELECT` 10k rows | 11.3 ms | 25.8 ms | IfxPy 2.3× faster |
| `SELECT` 100k rows | 112 ms | 271 ms | IfxPy 2.4× faster |
**`informix-db` wins on all 5 benchmarks against the C-bound driver, including a 2× win on bulk inserts.** **The honest summary:**
**Why pure-Python wins the round-trip-bound work:** IfxPy's code path is `Python → OneDB ODBC driver → libifdmr.so → wire`. Ours is `Python → wire`. The abstraction-layer overhead IfxPy carries on every call costs more than the C-vs-Python codec gap saves. - **Bulk-insert workloads: `informix-db` wins 1.6× at scale.** The pipelined `executemany` (Phase 33) sends all N BIND+EXECUTE PDUs before draining responses, eliminating per-row RTT. IfxPy still pays one round-trip per `IfxPy.execute(stmt, tuple)` call.
- **Large-fetch workloads: IfxPy wins 2.3× at scale.** Their C-level `fetch_tuple` decoder is genuinely faster than our Python `parse_tuple_payload` (~1.1 µs/row vs ~2.7 µs/row). At 100k rows, that 1.6 µs/row gap accumulates into a 160 ms wall-clock difference.
- **Small queries: comparable.** Both spend ~120 µs waiting for the server; the per-call codec cost is small relative to the round-trip.
**Why we win bulk inserts dramatically:** `executemany` pipelines all N BIND+EXECUTE PDUs to the wire before draining responses (Phase 33), eliminating the per-row round-trip that the older serial loop incurred. IfxPy still does one synchronous round-trip per row. **When to prefer `informix-db`:**
- ETL pipelines, log shipping, bulk writes (1.6× faster at scale)
- Containerized / minimal-dependency environments (50 KB wheel vs IfxPy's 92 MB OneDB tarball + libcrypt.so.1 dependency hell)
- Modern Python (works on 3.103.14; IfxPy is broken on Python 3.12+)
- Async / FastAPI workloads (we have native async; IfxPy doesn't)
**When IfxPy may be faster:**
- Analytical reporting queries pulling 10k+ rows in a single SELECT
- Workloads where the per-row decode cost dominates (wide rows, tight read loops)
These results are reproducible from `tests/benchmarks/compare/` — the Dockerfile, bench script, and README walk through every step.
Full methodology, IQR caveats, install gauntlet, and reproduction in [`tests/benchmarks/compare/README.md`](tests/benchmarks/compare/README.md). Full methodology, IQR caveats, install gauntlet, and reproduction in [`tests/benchmarks/compare/README.md`](tests/benchmarks/compare/README.md).

View File

@ -1,6 +1,6 @@
[project] [project]
name = "informix-db" name = "informix-db"
version = "2026.05.05.8" version = "2026.05.05.9"
description = "Pure-Python driver for IBM Informix IDS — speaks the SQLI wire protocol over raw sockets. No CSDK, no JVM, no native libraries." description = "Pure-Python driver for IBM Informix IDS — speaks the SQLI wire protocol over raw sockets. No CSDK, no JVM, no native libraries."
readme = "README.md" readme = "README.md"
license = { text = "MIT" } license = { text = "MIT" }

View File

@ -401,7 +401,7 @@ class Cursor:
# Phase 35: NFETCH loop — keep fetching until a response yields # Phase 35: NFETCH loop — keep fetching until a response yields
# zero new tuples. The previous "two NFETCHes" pattern silently # zero new tuples. The previous "two NFETCHes" pattern silently
# truncated any result set whose tuples didn't fit in 1-2 server # truncated any result set whose tuples didn't fit in 1-2 server
# batches (~200 rows at default 4096-byte buffer × 5-col rows). # batches (~200 rows at default 4096-byte buffer x 5-col rows).
# This bug was latent for ~30 phases because no test used a # This bug was latent for ~30 phases because no test used a
# large enough result set to trigger it. # large enough result set to trigger it.
self._conn._send_pdu(self._build_curname_nfetch_pdu(cursor_name)) self._conn._send_pdu(self._build_curname_nfetch_pdu(cursor_name))

View File

@ -4,19 +4,29 @@ Head-to-head benchmarks against [IfxPy](https://pypi.org/project/IfxPy/), the IB
## TL;DR ## TL;DR
Using **median + IQR over 10+ rounds** (mean was unreliable on the slow benchmarks — see "Statistical robustness" below): Using **median + IQR over 10+ rounds** (mean was unreliable on the slow benchmarks — see "Statistical robustness" below). Phase 36 added scaling benchmarks at 1k / 10k / 100k rows so the comparison shape is clearer:
| Benchmark | IfxPy 3.0.5 (C-bound) | informix-db (pure Python) | Result | | Benchmark | IfxPy 3.0.5 | informix-db | Result |
|---|---:|---:|---:| |---|---:|---:|---:|
| `select_one_row` (single-row latency) | 118 µs | **114 µs** | **`informix-db` 3% faster** | | `select_one_row` | 118 µs | 114 µs | comparable |
| `select_systables_first_10` (~10 rows) | 164 µs | **159 µs** | **`informix-db` 3% faster** | | `select_systables_first_10` | 130 µs | 159 µs | IfxPy 22% faster |
| `select_bench_table_all` (1000-row fetch) | 984 µs | **891 µs** | **`informix-db` 9% faster** | | `cold_connect_disconnect` | 11.0 ms | 10.5 ms | comparable |
| **`executemany(1000)` in transaction (bulk write)** | 21.4 ms | **10.4 ms** | **`informix-db` 2.05× faster** | | **`executemany(1k)` in txn** | 23.5 ms | 23.2 ms | tied |
| `cold_connect_disconnect` (login handshake) | 11.0 ms | **10.4 ms** | **`informix-db` 5% faster** | | **`executemany(10k)` in txn** | 259 ms | **161 ms** | **`informix-db` 1.6× faster** |
| **`executemany(100k)` in txn** | 2376 ms | **1487 ms** | **`informix-db` 1.6× faster** |
| `SELECT 1k rows` | 1.2 ms | 2.7 ms | IfxPy 2.3× faster |
| `SELECT 10k rows` | 11.3 ms | 25.8 ms | IfxPy 2.3× faster |
| `SELECT 100k rows` | 112 ms | 271 ms | IfxPy 2.4× faster |
**`informix-db` wins all 5 benchmarks against the C-bound driver, including a 2× win on bulk inserts.** **Two clear stories:**
The bulk-insert win comes from Phase 33's pipelined `executemany`: all N BIND+EXECUTE PDUs are sent to the wire before any response is drained, eliminating the per-row round-trip latency that the older serial loop (and IfxPy's per-call API) incur. The wire-alignment assumption that makes this safe — that Informix sends exactly N responses for N pipelined PDUs even when one row fails — is verified by `tests/test_executemany_pipeline.py` (constraint violation at row 0/100, 99/100, 500/1000). **1. Bulk insert: `informix-db` wins 1.6× at scale.** The pipelined `executemany` (Phase 33) sends all N BIND+EXECUTE PDUs to the wire before draining responses, eliminating per-row RTT. IfxPy still pays one synchronous round-trip per `IfxPy.execute(stmt, tuple)` call — that's ~24 µs/row regardless of N. We pay ~15 µs/row at scale (the prepare/release overhead amortizes better at larger N).
**2. Large fetch: IfxPy wins 2.3-2.4× at scale.** Their C-level `fetch_tuple` decoder runs at ~1.1 µs/row; our pure-Python `parse_tuple_payload` runs at ~2.7 µs/row. At 100k rows, the 1.6 µs/row gap accumulates into a 160 ms wall-clock difference. **This is the C-vs-Python codec cost showing up at scale, where it actually matters.**
For everyday-application workloads (single SELECT in a request, INSERT a handful of rows, transactional UPDATE), the two drivers are within 5-25% of each other. For the workloads where the gap widens, the direction depends on what you're doing — bulk-write favors us, bulk-read favors IfxPy.
**The wire-alignment assumption** that makes pipelined `executemany` safe — that Informix sends exactly N responses for N pipelined PDUs even when one row fails — is verified by `tests/test_executemany_pipeline.py` (constraint violation at row 0/100, 99/100, 500/1000).
## Statistical robustness — why median, not mean ## Statistical robustness — why median, not mean

View File

@ -171,6 +171,107 @@ def bench_cold_connect_disconnect() -> dict:
return measure("cold_connect_disconnect", ROUNDS_SLOW, run) return measure("cold_connect_disconnect", ROUNDS_SLOW, run)
# ----------------------------------------------------------------------------
# Phase 36 — scaling benchmarks (matched to test_scaling_perf.py)
# ----------------------------------------------------------------------------
def bench_executemany_scaling(n_rows: int) -> dict:
"""N-row insert in a single transaction. IfxPy doesn't pipeline —
each ``IfxPy.execute(stmt, params)`` is a synchronous round-trip
to the server. So per-row cost is roughly constant in N."""
rounds_for = {1_000: 10, 10_000: 5, 100_000: 3}
name = f"executemany_scaling_{n_rows}"
try:
conn = IfxPy.connect(
CONN_STR.replace("DATABASE=sysmaster", "DATABASE=testdb"), "", ""
)
except Exception as e:
return {"name": name, "skipped": f"testdb: {e}"}
IfxPy.autocommit(conn, IfxPy.SQL_AUTOCOMMIT_OFF)
table = f"p36_em_{n_rows}"
try:
try:
IfxPy.exec_immediate(conn, f"DROP TABLE {table}")
IfxPy.commit(conn)
except Exception:
pass
IfxPy.exec_immediate(
conn, f"CREATE TABLE {table} (id INT, name VARCHAR(64), value FLOAT)"
)
IfxPy.commit(conn)
counter = [0]
def run() -> None:
counter[0] += 1
base = counter[0] * n_rows
stmt = IfxPy.prepare(
conn, f"INSERT INTO {table} VALUES (?, ?, ?)"
)
for i in range(n_rows):
IfxPy.execute(stmt, (base + i, f"row_{base + i}", float(base + i)))
IfxPy.free_stmt(stmt)
IfxPy.commit(conn)
return measure(name, rounds_for[n_rows], run)
finally:
try:
IfxPy.exec_immediate(conn, f"DROP TABLE {table}")
IfxPy.commit(conn)
except Exception:
pass
IfxPy.close(conn)
def bench_select_scaling(n_rows: int) -> dict:
"""SELECT FIRST N from the pre-populated 100k-row p34_select table.
Tests IfxPy's per-row fetch cost at scale; should be roughly linear
in N like ours."""
rounds_for = {1_000: 10, 10_000: 5, 100_000: 3}
name = f"select_scaling_{n_rows}"
try:
conn = IfxPy.connect(
CONN_STR.replace("DATABASE=sysmaster", "DATABASE=testdb"), "", ""
)
except Exception as e:
return {"name": name, "skipped": f"testdb: {e}"}
try:
# Probe: does p34_select exist?
try:
stmt = IfxPy.exec_immediate(conn, "SELECT COUNT(*) FROM p34_select")
row = IfxPy.fetch_tuple(stmt)
IfxPy.free_stmt(stmt)
available = int(row[0])
if available < n_rows:
return {"name": name, "skipped": (
f"p34_select has only {available} rows; "
"run informix-db scaling benchmarks first to seed "
"the table"
)}
except Exception as e:
return {"name": name, "skipped": f"p34_select missing: {e}"}
def run() -> None:
stmt = IfxPy.exec_immediate(
conn, f"SELECT FIRST {n_rows} * FROM p34_select"
)
count = 0
while IfxPy.fetch_tuple(stmt):
count += 1
IfxPy.free_stmt(stmt)
if count != n_rows:
raise RuntimeError(
f"expected {n_rows} rows, got {count}"
)
return measure(name, rounds_for[n_rows], run)
finally:
IfxPy.close(conn)
def main() -> None: def main() -> None:
print("# IfxPy benchmark results", file=sys.stderr) print("# IfxPy benchmark results", file=sys.stderr)
print(f"# IfxPy version: {IfxPy.__version__ if hasattr(IfxPy, '__version__') else 'unknown'}", file=sys.stderr) print(f"# IfxPy version: {IfxPy.__version__ if hasattr(IfxPy, '__version__') else 'unknown'}", file=sys.stderr)
@ -187,6 +288,15 @@ def main() -> None:
results.append(bench_executemany_1000_rows_in_txn()) results.append(bench_executemany_1000_rows_in_txn())
results.append(bench_cold_connect_disconnect()) results.append(bench_cold_connect_disconnect())
# Phase 36 — scaling comparison. Skip 100k cases when --short is
# passed (e.g., for fast smoke runs); otherwise run all sizes.
short = "--short" in sys.argv
sizes = [1_000, 10_000] if short else [1_000, 10_000, 100_000]
for n in sizes:
results.append(bench_executemany_scaling(n))
for n in sizes:
results.append(bench_select_scaling(n))
# Emit machine-parseable lines on stdout. Reporting median (not # Emit machine-parseable lines on stdout. Reporting median (not
# mean) and IQR (not stddev) so a single outlier round can't # mean) and IQR (not stddev) so a single outlier round can't
# dominate the comparison numbers — mirrors pytest-benchmark's # dominate the comparison numbers — mirrors pytest-benchmark's

2
uv.lock generated
View File

@ -34,7 +34,7 @@ wheels = [
[[package]] [[package]]
name = "informix-db" name = "informix-db"
version = "2026.5.5.6" version = "2026.5.5.8"
source = { editable = "." } source = { editable = "." }
[package.optional-dependencies] [package.optional-dependencies]