Ryan Malloy 270155d2de Phase 36: IfxPy scaling comparison + honest comparison numbers (2026.05.05.9)
Extends the IfxPy comparison bench script with scaling workloads
(1k/10k/100k rows for both executemany and SELECT). Re-runs the
full comparison with consistent measurement methodology and updates
the README with the actually-correct numbers.

Earlier comparison runs reported informix-db winning all 5
benchmarks. Re-running select_bench_table_all with consistent
measurement gives 3.04 ms, not the 891 us I cited earlier - a
3.4x discrepancy attributable to noisy warmup + small-fixture
artifacts. The "we win everything" framing was wrong.

Corrected comparison reveals two clear stories:

Bulk-insert: pure-Python wins 1.6x at scale.
  executemany(10k):  IfxPy 259ms  -> us 161ms (1.6x faster)
  executemany(100k): IfxPy 2376ms -> us 1487ms (1.6x faster)
Reason: Phase 33's pipelining eliminates per-row RTT. IfxPy's
per-call API can't pipeline.

Large-fetch: IfxPy wins 2.3-2.4x at scale.
  SELECT 1k rows:   IfxPy 1.2ms  / us 2.7ms (IfxPy 2.3x)
  SELECT 10k rows:  IfxPy 11.3ms / us 25.8ms (IfxPy 2.3x)
  SELECT 100k rows: IfxPy 112ms  / us 271ms (IfxPy 2.4x)
Reason: C-level fetch_tuple at ~1.1us/row beats Python
parse_tuple_payload at ~2.7us/row. Real C-vs-Python codec gap
showing up at scale.

For everyday workloads (single SELECT in a request, INSERT a
handful of rows), drivers are within 5-25%. For workloads where
the gap widens, direction depends on what you're doing - bulk-
write favors us, bulk-read favors IfxPy.

README's "Compared to IfxPy" section rewritten with the corrected
numbers and an honest "when to prefer which" subsection.
tests/benchmarks/compare/README.md mirror updated.

Net narrative: a "faster at bulk-write, slower at bulk-read,
comparable elsewhere" comparison story is more honest and more
durable than a "we win everything" claim that would have collapsed
the first time a user ran their own benchmark.

Side note (lint): one ambiguous unicode `×` in cursors.py replaced
with `x`.

Phase 37 ticket: parse_tuple_payload is the bottleneck at scale.
Closing the 1.6 us/row gap to IfxPy would make us competitive on
bulk-fetch too. Possible approaches: Cython codec, deeper inlining,
per-column dispatch pre-bake.
2026-05-05 12:44:52 -06:00

121 lines
8.6 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# `informix-db` vs IfxPy comparison benchmark
Head-to-head benchmarks against [IfxPy](https://pypi.org/project/IfxPy/), the IBM-published C-bound Informix driver, on identical workloads against the same Informix Developer Edition Docker container.
## TL;DR
Using **median + IQR over 10+ rounds** (mean was unreliable on the slow benchmarks — see "Statistical robustness" below). Phase 36 added scaling benchmarks at 1k / 10k / 100k rows so the comparison shape is clearer:
| Benchmark | IfxPy 3.0.5 | informix-db | Result |
|---|---:|---:|---:|
| `select_one_row` | 118 µs | 114 µs | comparable |
| `select_systables_first_10` | 130 µs | 159 µs | IfxPy 22% faster |
| `cold_connect_disconnect` | 11.0 ms | 10.5 ms | comparable |
| **`executemany(1k)` in txn** | 23.5 ms | 23.2 ms | tied |
| **`executemany(10k)` in txn** | 259 ms | **161 ms** | **`informix-db` 1.6× faster** |
| **`executemany(100k)` in txn** | 2376 ms | **1487 ms** | **`informix-db` 1.6× faster** |
| `SELECT 1k rows` | 1.2 ms | 2.7 ms | IfxPy 2.3× faster |
| `SELECT 10k rows` | 11.3 ms | 25.8 ms | IfxPy 2.3× faster |
| `SELECT 100k rows` | 112 ms | 271 ms | IfxPy 2.4× faster |
**Two clear stories:**
**1. Bulk insert: `informix-db` wins 1.6× at scale.** The pipelined `executemany` (Phase 33) sends all N BIND+EXECUTE PDUs to the wire before draining responses, eliminating per-row RTT. IfxPy still pays one synchronous round-trip per `IfxPy.execute(stmt, tuple)` call — that's ~24 µs/row regardless of N. We pay ~15 µs/row at scale (the prepare/release overhead amortizes better at larger N).
**2. Large fetch: IfxPy wins 2.3-2.4× at scale.** Their C-level `fetch_tuple` decoder runs at ~1.1 µs/row; our pure-Python `parse_tuple_payload` runs at ~2.7 µs/row. At 100k rows, the 1.6 µs/row gap accumulates into a 160 ms wall-clock difference. **This is the C-vs-Python codec cost showing up at scale, where it actually matters.**
For everyday-application workloads (single SELECT in a request, INSERT a handful of rows, transactional UPDATE), the two drivers are within 5-25% of each other. For the workloads where the gap widens, the direction depends on what you're doing — bulk-write favors us, bulk-read favors IfxPy.
**The wire-alignment assumption** that makes pipelined `executemany` safe — that Informix sends exactly N responses for N pipelined PDUs even when one row fails — is verified by `tests/test_executemany_pipeline.py` (constraint violation at row 0/100, 99/100, 500/1000).
## Statistical robustness — why median, not mean
Earlier runs of this comparison reported mean (the pytest-benchmark default) and showed wildly different per-run numbers — `executemany(1000)` was variously 14%, 30%, or 43% slower than IfxPy depending on which run we sampled. The mean was being dominated by single-round outliers (GC pauses, server scheduler hiccups).
Switching to median + IQR with 10+ rounds gives stable run-to-run results:
- **Median resists single outliers**: one 50 ms round in a sample of 10 doesn't move the median; it would move the mean by 5 ms.
- **IQR (Q3 Q1) is the noise estimator**: directly comparable across drivers. If IfxPy's IQR is 8 ms on a 28 ms median (29% spread) while ours is 3 ms on 31 ms (10% spread), our number is ~3× more reliable than theirs even though our median is higher.
- **10 rounds for slow benchmarks** (1+ second per round) costs ~1 minute of wall time but eliminates the noisy-comparison problem.
Both `tests/benchmarks/test_*_perf.py` (host-side, pytest-benchmark) and `ifxpy_bench.py` (container-side, hand-rolled `time.perf_counter` measure loop) report median + IQR for cross-comparable numbers.
## What this means
Conventional wisdom says C beats Python at I/O drivers. Here, the picture is more nuanced:
- **When the wire dominates (single round-trips, bulk fetch), `informix-db` wins** because IfxPy adds an ODBC abstraction layer (Python → OneDB ODBC driver → libifdmr.so → wire) where we go direct (Python → wire).
- **When per-row marshaling dominates (executemany, wider tuple construction), IfxPy wins** because its C-level `execute(stmt, tuple)` is faster than our Python BIND-PDU build.
- **When the wire handshake dominates (cold connect), they tie** because both drivers wait ~11 ms for the server's login response.
The takeaway is that pure-Python doesn't mean "performance compromise" — it means **different overhead distribution**. For most application workloads (web requests doing a handful of small queries), the wire round-trip is what matters, and the abstraction-layer overhead IfxPy carries means `informix-db` is typically the same speed or faster.
## Why this comparison was hard to set up
**IfxPy is genuinely difficult to install on a modern system.** Capturing the install gauntlet for the record:
| Step | Detail |
|---|---|
| 1. Pin Python 3.11 | Python 3.13 fails: IfxPy's `setup.py` uses `use_2to3`, removed from setuptools 58 (October 2021). |
| 2. Pin setuptools <58 | Same root cause. |
| 3. CFLAGS hack | GCC 11+ (default since 2021) escalates the C extension's pointer-type warnings to errors. Need `CFLAGS="-Wno-incompatible-pointer-types -Wno-error"` to demote them. |
| 4. Download OneDB ODBC drivers | A 92 MB tarball from `hcl-onedb.github.io/odbc/`. The `pip install` only fetches headers the runtime libs are a separate, undocumented download. |
| 5. Set INFORMIXDIR + LD_LIBRARY_PATH | Across four directories (`lib/`, `lib/cli/`, `lib/esql/`, `gls/dll/`). |
| 6. Install `libcrypt.so.1` | The OneDB drivers link against the libcrypt-1 ABI (deprecated in 2018, replaced by libcrypt.so.2). Modern Arch / Fedora 35+ / RHEL 9 ship only libcrypt.so.2; you need a compatibility shim (Ubuntu 20.04 still has it; modern distros need `libxcrypt-compat` or similar). |
| 7. Build runtime container | We use `Dockerfile.ifxpy` here because Ubuntu 20.04 is the most recent base distro that still ships `libcrypt.so.1` natively. |
By contrast, `informix-db`'s install is `pip install informix-db`. No external downloads, no system packages, no LD_LIBRARY_PATH, no Docker required.
## Methodology
- Both drivers ran against the **same** Informix Developer Edition 15.0.1.0.3DE Docker container (`informix-db-test` from `tests/docker-compose.yml`).
- The host runs Arch Linux on x86_64; the IfxPy container runs Ubuntu 20.04 on x86_64. Both reach the server through the loopback path (host's `127.0.0.1:9088` for `informix-db`; `--network=host` for the IfxPy container).
- Each benchmark runs 100/20/3 rounds depending on per-iteration cost; we report the mean. Stddev is small (under 5%) for all reported numbers within-run jitter doesn't affect the qualitative result.
- Workloads are matched semantically: same SQL, same row counts, same fetch patterns. Where they differ (IfxPy's `IfxPy.fetch_tuple` vs. our `cursor.fetchall`), we use whichever idiom exhausts the cursor in each driver.
## Reproduce
From the project root:
```bash
# 1. Start the dev Informix container
make ifx-up
# 2. Seed the 1k-row test table on the host (using informix-db)
uv run python -c "
import informix_db, contextlib
conn = informix_db.connect(host='127.0.0.1', port=9088,
user='informix', password='in4mix',
database='sysmaster', server='informix', autocommit=True)
cur = conn.cursor()
with contextlib.suppress(Exception): cur.execute('DROP TABLE p21_bench')
cur.execute('CREATE TABLE p21_bench (id INT, name VARCHAR(64), counter INT, value FLOAT, created DATE)')
cur.executemany('INSERT INTO p21_bench VALUES (?, ?, ?, ?, ?)',
[(i, f'row_{i:04d}', i*7, float(i)*1.5, None) for i in range(1000)])
conn.close()
"
# 3. Build + run the IfxPy benchmark container
docker build -f tests/benchmarks/compare/Dockerfile.ifxpy \
-t ifxpy-bench tests/benchmarks/compare/
docker run --rm --network=host ifxpy-bench
# 4. Run informix-db benchmarks for the matched comparison
uv run pytest tests/benchmarks/test_select_perf.py \
tests/benchmarks/test_pool_perf.py \
tests/benchmarks/test_insert_perf.py \
-m benchmark --benchmark-only --benchmark-warmup=on
```
## Files
- `Dockerfile.ifxpy` Ubuntu 20.04 container with Python 3.9, IfxPy, and OneDB drivers installed
- `ifxpy_bench.py` IfxPy benchmark workloads (mirrors `tests/benchmarks/test_*_perf.py`)
- This README
## Caveats
- IfxPy 3.0.5 is the latest PyPI version (from October 2020). It's the most actively-maintained C-bound option but hasn't shipped a release in ~5 years.
- Numbers will vary by host, distro, kernel, network stack re-run on your own hardware before drawing strong conclusions.
- The 1k-row INSERT benchmark uses different APIs (IfxPy's `prepare`+`execute` loop vs our `executemany`); the comparison is by total wall-clock time for the equivalent workload, not by per-call overhead.