2 Commits

Author SHA1 Message Date
5825d5c55e Extend scaling benches: 100-column case + 100k memory profile + 1M gating
Adds three things to test_scaling_perf.py:

1. 100-column wide-row SELECT - codec stress test at extreme widths.
   1k rows x 100 cols = 19.4 ms (~194 us/row, ~1.94 us/column-decode).
   Per-column cost continues to drop with width thanks to loop
   amortization (5 cols: 480 ns/col -> 100 cols: 194 ns/col).

2. 100k-row memory profile - samples RSS pre-execute, post-execute
   (materialization cost), and during iteration. Real numbers:
     pre-execute:  45.8 MB
     post-execute: 71.2 MB  (+25.4 MB = ~259 bytes/row materialization)
     iteration:    0 KB extra (just walks the existing list)

   Documents the in-memory cursor's actual cost: 100k rows = 25 MB,
   1M rows = ~250 MB. Fair regression baseline (tripped at 500 MB).

3. 1M-row scaling gated behind IFX_BENCH_1M=1 env var. Default off
   because the dev container's rootdbs runs out of space. For
   production-sized servers users can opt in. The implementation
   is linear-extrapolation-correct (executemany 100k -> 1M = ~15s,
   SELECT 100k -> 1M = ~3s).

Note on the dev-container size limit: dev image's rootdbs is sized
for typical developer workloads, not stress testing. A 1M-row
INSERT exceeds the available pages and fails with -242 ISAM -113
(out of space). This is correct behavior - the limit is enforced
at the storage layer.

Switched RSS sampling from ru_maxrss (peak, monotonic) to
/proc/self/status VmRSS (current). Earlier runs showed flat
RSS because peak from earlier in the test session masked the
fluctuation.
2026-05-05 13:10:32 -06:00
8eb19f7534 Phase 34: Scaling benchmarks (1k/10k/100k rows; 5/20/50 cols) (2026.05.05.8)
Adds tests/benchmarks/test_scaling_perf.py with parametrized
benchmarks across row-count, column-width, and type-mix axes.
Caught the NFETCH-loop bug (Phase 35) immediately on first run.

Headline numbers:

Bulk insert (executemany in transaction):
  1k rows:   23 ms (23 us/row)
  10k rows:  161 ms (16 us/row)
  100k rows: 1487 ms (15 us/row, ~67k rows/sec sustained)

SELECT (linear scaling, near-constant per-row):
  1k rows:   2.7 ms (2.7 us/row)
  10k rows:  25.8 ms (2.6 us/row)
  100k rows: 271 ms (2.7 us/row)

Wide-row SELECT (1k rows x N cols):
  5 cols:  2.4 ms
  20 cols: 5.1 ms
  50 cols: 10.1 ms

Type-mix SELECT (INT + VARCHAR + DECIMAL + DATE + FLOAT + SMALLINT):
  1000 rows: 4.7 ms (4.7 us/row, ~1.7x baseline)

Per-row codec cost is essentially constant from 1k to 100k rows
(2.7 us/row), proving parse_tuple_payload optimizations (Phases
23-25) hold at 100x scale with no GC-pause amplification or
memory-pressure degradation.

Per-row insert cost actually DECREASES with scale (23us at 1k to
15us at 100k) - Phase 33's pipelining amortizes prepare/release
overhead better at larger N.

10 new parametrized benchmarks. Total: 77 unit + 249 integration +
43 benchmark = 369 tests.
2026-05-05 12:38:07 -06:00