Tier 1 — make existing benchmarks reliable:
* Bumped slow-bench rounds: cold_connect_disconnect 5->15, executemany
series 3->10. Single-round outliers no longer dominate.
* Switched bench reporting to median + IQR. Mean was being moved by
individual GC pauses / scheduler hiccups (IfxPy executemany IQR
was 8.2 ms on a 28 ms median - 29% spread - mean was unreliable).
* Updated ifxpy_bench.py to also report median + IQR alongside mean
for cross-comparable numbers.
* Makefile bench targets now show median, iqr, mean, stddev, ops, rounds.
The robust statistics flipped the comparison story:
Old (mean, 3 rounds): us 9% faster / IfxPy 30% faster on 2 of 5
New (median, 10+ rds): us faster on 4 of 5 benchmarks
| Benchmark | IfxPy | informix-db | Δ |
|---|---|---|---|
| select_one_row | 170us | 119us | us 30% faster |
| select_systables_first_10 | 186us | 142us | us 24% faster |
| select_bench_table_all 1k | 980us | 832us | us 15% faster |
| executemany 1k in txn | 28.3ms | 31.3ms | us 10% slower |
| cold_connect_disconnect | 12.0ms | 10.7ms | us 11% faster |
Tier 2 — add benchmarks for claims we make but don't verify:
tests/benchmarks/test_observability_perf.py:
* test_streaming_fetch_memory_profile — RSS sampling during a
cursor iteration. Documents memory growth shape; regression
wall at 100 MB / 1k rows. Currently flat (in-memory cursor
doesn't grow detectably for 278 rows).
* test_select_1_latency_percentiles — 1000-query distribution
with p50/p90/p95/p99/max. Result: p99/p50 = 1.42x (tight tail).
p50=108us, p99=153us.
* test_concurrent_pool_throughput[2,4,8] — N worker threads
through pool, measures aggregate QPS + per-thread fairness.
Plateaus at ~6K QPS (server-bound); per-thread latency scales
~linearly with N (server serialization expected).
README.md (project root): updated Compared-to-IfxPy table with
the median-based numbers + IQR awareness note.
tests/benchmarks/compare/README.md: added "Statistical robustness"
section explaining why median over mean for fair comparison.
236 integration tests pass; ruff clean.
Polish item #1: byte-for-byte regression test that asserts our
generated login PDU is structurally identical to JDBC's reference
captured in docs/CAPTURES/01-connect-only.socat.log.
The test (tests/test_pdu_match.py) immediately caught a real bug:
the capability section was misread during Phase 0 byte-decoding.
Earlier text claimed Cap_1=1, Cap_2=0x3c000000, Cap_3=0 — actually:
Cap_1 = 0x0000013c (= (capability_class << 8) | protocol_version
where protocol_version = 0x3c = PF_PROT_SQLI_0600)
Cap_2 = 0
Cap_3 = 0
The misalignment was: the 0x3c byte I attributed to Cap_2's high
byte was actually Cap_1's low byte. The dev-image server is
permissive enough to accept arbitrary capability values, so the
connection succeeded even with the wrong bytes — but the PDU wasn't
structurally identical to JDBC's reference. SERVER-ACCEPTS ≠
STRUCTURALLY-CORRECT. This is exactly why the byte-for-byte diff
was the right polish item; "it connects" was a false ceiling.
After fix:
- 6 PDU-match tests assert byte-for-byte equality at offsets 2..280
(the structural prefix: SLheader sans length, all login markers,
capability ints, username, password, protocol IDs, env vars).
- Bytes 280+ legitimately differ per process (PID, TID, hostname,
cwd, AppName) — those are NOT asserted.
- Length field (offsets 0..1) also legitimately differs because our
PDU has shorter env list and AppName.
- Test uses monkey-patched IfxSocket so no network is needed.
Polish item #2: Makefile per global CLAUDE.md convention. Targets:
install, lint, format, test, test-integration, test-all, test-pdu,
ifx-up/down/logs/shell/status, capture (re-run JDBC scenarios under
socat), clean. `make` (no target) prints help.
Doc updates:
- PROTOCOL_NOTES.md §12: corrected capability section with the
actual values and an explanation of the methodology lesson
- DECISION_LOG.md: new entry recording the correction with a
pointer to the regression test and the takeaway
Side artifacts:
- docs/CAPTURES/03-py-connect-only.socat.log
- docs/CAPTURES/04-py-no-database.socat.log
- docs/CAPTURES/05-py-fixed-caps.socat.log
Test counts: 40 unit + 6 integration = 46 total, all green, ruff clean.