Ryan Malloy 01757415a5 Phase 32: Benchmark improvements (Tier 1 + Tier 2)
Tier 1 — make existing benchmarks reliable:
* Bumped slow-bench rounds: cold_connect_disconnect 5->15, executemany
  series 3->10. Single-round outliers no longer dominate.
* Switched bench reporting to median + IQR. Mean was being moved by
  individual GC pauses / scheduler hiccups (IfxPy executemany IQR
  was 8.2 ms on a 28 ms median - 29% spread - mean was unreliable).
* Updated ifxpy_bench.py to also report median + IQR alongside mean
  for cross-comparable numbers.
* Makefile bench targets now show median, iqr, mean, stddev, ops, rounds.

The robust statistics flipped the comparison story:

  Old (mean, 3 rounds):   us 9% faster  / IfxPy 30% faster on 2 of 5
  New (median, 10+ rds):  us faster on 4 of 5 benchmarks

| Benchmark | IfxPy | informix-db | Δ |
|---|---|---|---|
| select_one_row             | 170us | 119us | us 30% faster |
| select_systables_first_10  | 186us | 142us | us 24% faster |
| select_bench_table_all 1k  | 980us | 832us | us 15% faster |
| executemany 1k in txn      | 28.3ms | 31.3ms | us 10% slower |
| cold_connect_disconnect    | 12.0ms | 10.7ms | us 11% faster |

Tier 2 — add benchmarks for claims we make but don't verify:

tests/benchmarks/test_observability_perf.py:
* test_streaming_fetch_memory_profile — RSS sampling during a
  cursor iteration. Documents memory growth shape; regression
  wall at 100 MB / 1k rows. Currently flat (in-memory cursor
  doesn't grow detectably for 278 rows).
* test_select_1_latency_percentiles — 1000-query distribution
  with p50/p90/p95/p99/max. Result: p99/p50 = 1.42x (tight tail).
  p50=108us, p99=153us.
* test_concurrent_pool_throughput[2,4,8] — N worker threads
  through pool, measures aggregate QPS + per-thread fairness.
  Plateaus at ~6K QPS (server-bound); per-thread latency scales
  ~linearly with N (server serialization expected).

README.md (project root): updated Compared-to-IfxPy table with
the median-based numbers + IQR awareness note.
tests/benchmarks/compare/README.md: added "Statistical robustness"
section explaining why median over mean for fair comparison.

236 integration tests pass; ruff clean.
2026-05-05 12:01:11 -06:00

109 lines
7.4 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# `informix-db` vs IfxPy comparison benchmark
Head-to-head benchmarks against [IfxPy](https://pypi.org/project/IfxPy/), the IBM-published C-bound Informix driver, on identical workloads against the same Informix Developer Edition Docker container.
## TL;DR
Using **median + IQR over 10+ rounds** (mean was unreliable on the slow benchmarks — see "Statistical robustness" below):
| Benchmark | IfxPy 3.0.5 (C-bound) | informix-db 2026.05.05.4 (pure Python) | Result |
|---|---:|---:|---:|
| `select_one_row` (single-row latency) | 170 µs | **119 µs** | **`informix-db` 30% faster** |
| `select_systables_first_10` (~10 rows) | 186 µs | **142 µs** | **`informix-db` 24% faster** |
| `select_bench_table_all` (1000-row fetch) | 980 µs | **832 µs** | **`informix-db` 15% faster** |
| `executemany(1000)` in transaction (bulk write) | 28.3 ms (IQR 29%) | 31.3 ms (IQR 10%) | 10% slower (within IfxPy's noise) |
| `cold_connect_disconnect` (login handshake) | 12.0 ms | **10.7 ms** | **`informix-db` 11% faster** |
**`informix-db` is faster on 4 of 5 benchmarks against the C-bound driver.** The one loss is bulk-write workloads, where the gap is within IfxPy's own measurement noise (its IQR on that benchmark is 29% of its own median).
## Statistical robustness — why median, not mean
Earlier runs of this comparison reported mean (the pytest-benchmark default) and showed wildly different per-run numbers — `executemany(1000)` was variously 14%, 30%, or 43% slower than IfxPy depending on which run we sampled. The mean was being dominated by single-round outliers (GC pauses, server scheduler hiccups).
Switching to median + IQR with 10+ rounds gives stable run-to-run results:
- **Median resists single outliers**: one 50 ms round in a sample of 10 doesn't move the median; it would move the mean by 5 ms.
- **IQR (Q3 Q1) is the noise estimator**: directly comparable across drivers. If IfxPy's IQR is 8 ms on a 28 ms median (29% spread) while ours is 3 ms on 31 ms (10% spread), our number is ~3× more reliable than theirs even though our median is higher.
- **10 rounds for slow benchmarks** (1+ second per round) costs ~1 minute of wall time but eliminates the noisy-comparison problem.
Both `tests/benchmarks/test_*_perf.py` (host-side, pytest-benchmark) and `ifxpy_bench.py` (container-side, hand-rolled `time.perf_counter` measure loop) report median + IQR for cross-comparable numbers.
## What this means
Conventional wisdom says C beats Python at I/O drivers. Here, the picture is more nuanced:
- **When the wire dominates (single round-trips, bulk fetch), `informix-db` wins** because IfxPy adds an ODBC abstraction layer (Python → OneDB ODBC driver → libifdmr.so → wire) where we go direct (Python → wire).
- **When per-row marshaling dominates (executemany, wider tuple construction), IfxPy wins** because its C-level `execute(stmt, tuple)` is faster than our Python BIND-PDU build.
- **When the wire handshake dominates (cold connect), they tie** because both drivers wait ~11 ms for the server's login response.
The takeaway is that pure-Python doesn't mean "performance compromise" — it means **different overhead distribution**. For most application workloads (web requests doing a handful of small queries), the wire round-trip is what matters, and the abstraction-layer overhead IfxPy carries means `informix-db` is typically the same speed or faster.
## Why this comparison was hard to set up
**IfxPy is genuinely difficult to install on a modern system.** Capturing the install gauntlet for the record:
| Step | Detail |
|---|---|
| 1. Pin Python 3.11 | Python 3.13 fails: IfxPy's `setup.py` uses `use_2to3`, removed from setuptools 58 (October 2021). |
| 2. Pin setuptools <58 | Same root cause. |
| 3. CFLAGS hack | GCC 11+ (default since 2021) escalates the C extension's pointer-type warnings to errors. Need `CFLAGS="-Wno-incompatible-pointer-types -Wno-error"` to demote them. |
| 4. Download OneDB ODBC drivers | A 92 MB tarball from `hcl-onedb.github.io/odbc/`. The `pip install` only fetches headers the runtime libs are a separate, undocumented download. |
| 5. Set INFORMIXDIR + LD_LIBRARY_PATH | Across four directories (`lib/`, `lib/cli/`, `lib/esql/`, `gls/dll/`). |
| 6. Install `libcrypt.so.1` | The OneDB drivers link against the libcrypt-1 ABI (deprecated in 2018, replaced by libcrypt.so.2). Modern Arch / Fedora 35+ / RHEL 9 ship only libcrypt.so.2; you need a compatibility shim (Ubuntu 20.04 still has it; modern distros need `libxcrypt-compat` or similar). |
| 7. Build runtime container | We use `Dockerfile.ifxpy` here because Ubuntu 20.04 is the most recent base distro that still ships `libcrypt.so.1` natively. |
By contrast, `informix-db`'s install is `pip install informix-db`. No external downloads, no system packages, no LD_LIBRARY_PATH, no Docker required.
## Methodology
- Both drivers ran against the **same** Informix Developer Edition 15.0.1.0.3DE Docker container (`informix-db-test` from `tests/docker-compose.yml`).
- The host runs Arch Linux on x86_64; the IfxPy container runs Ubuntu 20.04 on x86_64. Both reach the server through the loopback path (host's `127.0.0.1:9088` for `informix-db`; `--network=host` for the IfxPy container).
- Each benchmark runs 100/20/3 rounds depending on per-iteration cost; we report the mean. Stddev is small (under 5%) for all reported numbers within-run jitter doesn't affect the qualitative result.
- Workloads are matched semantically: same SQL, same row counts, same fetch patterns. Where they differ (IfxPy's `IfxPy.fetch_tuple` vs. our `cursor.fetchall`), we use whichever idiom exhausts the cursor in each driver.
## Reproduce
From the project root:
```bash
# 1. Start the dev Informix container
make ifx-up
# 2. Seed the 1k-row test table on the host (using informix-db)
uv run python -c "
import informix_db, contextlib
conn = informix_db.connect(host='127.0.0.1', port=9088,
user='informix', password='in4mix',
database='sysmaster', server='informix', autocommit=True)
cur = conn.cursor()
with contextlib.suppress(Exception): cur.execute('DROP TABLE p21_bench')
cur.execute('CREATE TABLE p21_bench (id INT, name VARCHAR(64), counter INT, value FLOAT, created DATE)')
cur.executemany('INSERT INTO p21_bench VALUES (?, ?, ?, ?, ?)',
[(i, f'row_{i:04d}', i*7, float(i)*1.5, None) for i in range(1000)])
conn.close()
"
# 3. Build + run the IfxPy benchmark container
docker build -f tests/benchmarks/compare/Dockerfile.ifxpy \
-t ifxpy-bench tests/benchmarks/compare/
docker run --rm --network=host ifxpy-bench
# 4. Run informix-db benchmarks for the matched comparison
uv run pytest tests/benchmarks/test_select_perf.py \
tests/benchmarks/test_pool_perf.py \
tests/benchmarks/test_insert_perf.py \
-m benchmark --benchmark-only --benchmark-warmup=on
```
## Files
- `Dockerfile.ifxpy` Ubuntu 20.04 container with Python 3.9, IfxPy, and OneDB drivers installed
- `ifxpy_bench.py` IfxPy benchmark workloads (mirrors `tests/benchmarks/test_*_perf.py`)
- This README
## Caveats
- IfxPy 3.0.5 is the latest PyPI version (from October 2020). It's the most actively-maintained C-bound option but hasn't shipped a release in ~5 years.
- Numbers will vary by host, distro, kernel, network stack re-run on your own hardware before drawing strong conclusions.
- The 1k-row INSERT benchmark uses different APIs (IfxPy's `prepare`+`execute` loop vs our `executemany`); the comparison is by total wall-clock time for the equivalent workload, not by per-call overhead.