Phase 21.1: executemany perf - it was the autocommit cliff (2026.05.04.6)

Investigation of the Phase 21 baseline finding that executemany(N) cost
scaled linearly per-row (1.74 ms x N) regardless of batch size.

Root cause: every autocommit=True INSERT forces a server-side
transaction-log flush. Not a wire-protocol bug.

Numbers:
* executemany(1000) autocommit=True: 1.72 s (1.72 ms/row)
* executemany(1000) in single txn:    32 ms (32 us/row)

53x speedup from changing the transaction boundary, not the driver.
Pure protocol overhead is ~32 us/row -> ~31K rows/sec sustained
throughput on a single connection. Comparable to pg8000.

Added test_executemany_1000_rows_in_txn benchmark to make this
visible. Updated README headline numbers and added a "Performance
gotchas" section explaining when autocommit=False matters.

Decision: don't pipeline. The remaining 32 us is already excellent;
the autocommit gotcha is the real user-facing footgun. Docs > code.
If someone reports needing >31K rows/sec single-connection, that
becomes Phase 22.
This commit is contained in:
Ryan Malloy 2026-05-04 17:26:16 -06:00
parent 90ce035a00
commit 495128c679
6 changed files with 604 additions and 454 deletions

View File

@ -2,6 +2,39 @@
All notable changes to `informix-db`. Versioning is [CalVer](https://calver.org/) — `YYYY.MM.DD` for date-based releases, `YYYY.MM.DD.N` for same-day post-releases per PEP 440. All notable changes to `informix-db`. Versioning is [CalVer](https://calver.org/) — `YYYY.MM.DD` for date-based releases, `YYYY.MM.DD.N` for same-day post-releases per PEP 440.
## 2026.05.04.6 — `executemany` perf finding: it was the autocommit cliff
Investigation of the Phase 21 finding that `executemany(N)` cost scaled linearly per-row (1.74 ms × N) regardless of batch size. **Root cause: every autocommit-True INSERT forces a server-side transaction-log flush.** Not a wire-protocol bug.
### Added
- **`test_executemany_1000_rows_in_txn`** benchmark — same workload, but inside a single transaction with one COMMIT at the end. Isolates pure protocol cost from server-storage cost.
- New module-scoped `txn_conn` fixture in `tests/benchmarks/test_insert_perf.py` for autocommit-False benchmarks.
### Findings
| Mode | Total | Per row |
|-|-:|-:|
| `executemany(1000)` autocommit=True | 1.72 s | 1.72 ms |
| `executemany(1000)` in single txn | 32 ms | **32 µs** |
**53× speedup from changing the transaction boundary, not the driver.** Pure protocol overhead is ~32 µs/row → ~31,000 rows/sec sustained throughput on a single connection. Comparable to mature pure-Python drivers (pg8000).
### Changed
- **`tests/benchmarks/README.md`** — updated headline numbers to show both modes, added a "Performance gotchas" section explaining when to use `autocommit=False` for bulk loads.
- **`tests/benchmarks/baseline.json`** — refreshed to include the new txn-mode measurement (now 29 entries, was 28).
### Decision: don't pipeline
Pipelining BIND+EXECUTE PDUs (writing N without waiting for responses between them) could potentially halve the 32 µs/row figure on loopback. Decided against:
- The remaining 32 µs is already excellent — single-connection bulk-load performance is not where users hit limits.
- Pipelining adds complexity around TCP send-buffer management, partial-failure semantics, and error reporting (which row failed when 50 are in flight).
- The autocommit gotcha is the *real* user-facing footgun. Better docs > more code.
If someone reports needing >31K rows/sec single-connection, this becomes Phase 22 work.
## 2026.05.04.5 — Performance benchmarks (Phase 21) ## 2026.05.04.5 — Performance benchmarks (Phase 21)
Adds `tests/benchmarks/` — a `pytest-benchmark` driven suite covering codec micro-benchmarks (no server required) and end-to-end SELECT/INSERT/pool/async benchmarks. Establishes a committed `baseline.json` so future PRs can be compared against the floor and regressions caught at review. Adds `tests/benchmarks/` — a `pytest-benchmark` driven suite covering codec micro-benchmarks (no server required) and end-to-end SELECT/INSERT/pool/async benchmarks. Establishes a committed `baseline.json` so future PRs can be compared against the floor and regressions caught at review.

View File

@ -1,6 +1,6 @@
[project] [project]
name = "informix-db" name = "informix-db"
version = "2026.05.04.5" version = "2026.05.04.6"
description = "Pure-Python driver for IBM Informix IDS — speaks the SQLI wire protocol over raw sockets. No CSDK, no JVM, no native libraries." description = "Pure-Python driver for IBM Informix IDS — speaks the SQLI wire protocol over raw sockets. No CSDK, no JVM, no native libraries."
readme = "README.md" readme = "README.md"
license = { text = "MIT" } license = { text = "MIT" }

View File

@ -21,22 +21,40 @@ Performance baselines for `informix-db`. Two layers:
| **Cold connect + close** (login handshake) | **11.2 ms** | **89** | | **Cold connect + close** (login handshake) | **11.2 ms** | **89** |
| 1000-row SELECT * | 1.56 ms | 640 | | 1000-row SELECT * | 1.56 ms | 640 |
| INSERT (single, prepared) | 1.88 ms | 530 | | INSERT (single, prepared) | 1.88 ms | 530 |
| `executemany(100 rows)` | 181 ms | 5.5 (i.e. ~550 rows/sec) | | `executemany(100)` autocommit=True | 181 ms | ~550 rows/sec |
| `executemany(1000 rows)` | 1.74 s | 0.57 (i.e. ~575 rows/sec) | | `executemany(1000)` autocommit=True | 1.72 s | ~580 rows/sec |
| **`executemany(1000)` in single transaction** | **32 ms** | **~31,000 rows/sec** |
### What these tell you ### What these tell you
- **Pool gives 72× speedup** over cold connect. If your app opens a - **Pool gives 72× speedup** over cold connect. If your app opens a
connection per request, fix that first. connection per request, fix that first.
- **Wrap bulk INSERTs in a transaction.** That's a **53× speedup** over
the autocommit-True default. With autocommit on, each row forces the
server to flush its transaction log; in transaction mode the flush
happens once at COMMIT. Per-row cost drops from 1.72 ms (storage-bound)
to 32 µs (pure protocol). PEP 249's default `autocommit=False` was
designed for this — we just default to `False`.
- **Codec is not the bottleneck.** Per-row decode (2.9 µs) is 1000× faster - **Codec is not the bottleneck.** Per-row decode (2.9 µs) is 1000× faster
than wire round-trip (177 µs for `SELECT 1`). Network and server-side than wire round-trip (177 µs for `SELECT 1`). Network and server-side
cost dominate. cost dominate.
- **UTF-8 carries no measurable cost.** `decode_varchar_utf8` runs at - **UTF-8 carries no measurable cost.** `decode_varchar_utf8` runs at
216 ns vs `decode_varchar_short` at 170 ns — the 27% delta is the 216 ns vs `decode_varchar_short` at 170 ns — the 27% delta is the
multibyte string walk inherent in UTF-8 decoding, not Phase 20 overhead. multibyte string walk inherent in UTF-8 decoding, not Phase 20 overhead.
- **`executemany` doesn't scale linearly.** 100 rows in 181 ms = 1.81 ms/row;
1000 rows in 1.74 s = 1.74 ms/row. Suggests per-row cost dominates over ### Performance gotchas
PREPARE amortization. Worth investigating in Phase 21.x.
- **`autocommit=True` + `executemany` is the slowest reasonable pattern.**
Use it only when each row genuinely needs to land independently. For
bulk loads, default `autocommit=False` and call `conn.commit()` at the
end of the batch.
- **Single `INSERT` in a tight loop is 1.88 ms each** — strictly worse
than `executemany` (which saves PREPARE/RELEASE overhead). If you find
yourself looping over `cur.execute("INSERT...")` hundreds of times,
switch to `executemany`.
- **Cold connect is 11 ms.** The login handshake is *expensive* compared
to anything you'll do with the connection. Pool everything in
long-lived processes.
## Regression policy ## Regression policy

File diff suppressed because it is too large Load Diff

View File

@ -3,19 +3,47 @@
The single-row vs. executemany delta is the ``executemany`` win we The single-row vs. executemany delta is the ``executemany`` win we
PREPARE+RELEASE once and BIND+EXECUTE per row, vs PREPARE+RELEASE per PREPARE+RELEASE once and BIND+EXECUTE per row, vs PREPARE+RELEASE per
row. On any decent network this is 10-50x. row. On any decent network this is 10-50x.
The autocommit-True vs. autocommit-False delta is the **transaction-flush
cost** every autocommit INSERT forces the server to flush its
transaction log per row, drowning out everything else. The benchmark
splits these so we can see protocol overhead independently.
""" """
from __future__ import annotations from __future__ import annotations
import contextlib import contextlib
from collections.abc import Iterator
import pytest import pytest
import informix_db import informix_db
from tests.conftest import ConnParams
pytestmark = [pytest.mark.benchmark, pytest.mark.integration] pytestmark = [pytest.mark.benchmark, pytest.mark.integration]
@pytest.fixture(scope="module")
def txn_conn(conn_params: ConnParams) -> Iterator[informix_db.Connection]:
"""A separate connection with autocommit=False so we can wrap an
executemany call in a single explicit transaction. Uses ``testdb``
(the logged user DB) autocommit-off is meaningless on unlogged DBs.
"""
conn = informix_db.connect(
host=conn_params.host,
port=conn_params.port,
user=conn_params.user,
password=conn_params.password,
database="testdb",
server=conn_params.server,
autocommit=False,
)
try:
yield conn
finally:
conn.close()
def _setup_temp_table(conn: informix_db.Connection, name: str) -> None: def _setup_temp_table(conn: informix_db.Connection, name: str) -> None:
cur = conn.cursor() cur = conn.cursor()
with contextlib.suppress(informix_db.Error): with contextlib.suppress(informix_db.Error):
@ -82,7 +110,9 @@ def test_executemany_100_rows(
def test_executemany_1000_rows( def test_executemany_1000_rows(
benchmark, bench_conn: informix_db.Connection benchmark, bench_conn: informix_db.Connection
) -> None: ) -> None:
"""1000 INSERTs via executemany — sustained-batch throughput.""" """1000 INSERTs via executemany under autocommit=True — every row
forces a transaction-log flush. Worst-case protocol *plus* server
storage cost."""
table = "p21_ins_emany_1000" table = "p21_ins_emany_1000"
_setup_temp_table(bench_conn, table) _setup_temp_table(bench_conn, table)
counter = [0] counter = [0]
@ -104,3 +134,37 @@ def test_executemany_1000_rows(
benchmark.pedantic(run, rounds=3, iterations=1) benchmark.pedantic(run, rounds=3, iterations=1)
finally: finally:
_drop_temp_table(bench_conn, table) _drop_temp_table(bench_conn, table)
def test_executemany_1000_rows_in_txn(
benchmark, txn_conn: informix_db.Connection
) -> None:
"""1000 INSERTs via executemany inside ONE transaction — single
log flush at COMMIT time. Isolates the protocol cost from the
autocommit-flush cost. The delta vs the autocommit variant is the
server-side log-flush penalty (un-fixable from the client side)."""
table = "p21_ins_emany_txn"
_setup_temp_table(txn_conn, table)
txn_conn.commit() # Land the CREATE TABLE before timing
counter = [0]
def run() -> None:
counter[0] += 1
base = counter[0] * 1000
rows = [
(base + i, f"row_{base + i}", float(base + i)) for i in range(1000)
]
cur = txn_conn.cursor()
cur.executemany(
f"INSERT INTO {table} VALUES (?, ?, ?)",
rows,
)
cur.close()
txn_conn.commit()
try:
benchmark.pedantic(run, rounds=3, iterations=1)
finally:
with contextlib.suppress(informix_db.Error):
_drop_temp_table(txn_conn, table)
txn_conn.commit()

2
uv.lock generated
View File

@ -34,7 +34,7 @@ wheels = [
[[package]] [[package]]
name = "informix-db" name = "informix-db"
version = "2026.5.4.4" version = "2026.5.4.6"
source = { editable = "." } source = { editable = "." }
[package.optional-dependencies] [package.optional-dependencies]