Phase 21.1: executemany perf - it was the autocommit cliff (2026.05.04.6)
Investigation of the Phase 21 baseline finding that executemany(N) cost scaled linearly per-row (1.74 ms x N) regardless of batch size. Root cause: every autocommit=True INSERT forces a server-side transaction-log flush. Not a wire-protocol bug. Numbers: * executemany(1000) autocommit=True: 1.72 s (1.72 ms/row) * executemany(1000) in single txn: 32 ms (32 us/row) 53x speedup from changing the transaction boundary, not the driver. Pure protocol overhead is ~32 us/row -> ~31K rows/sec sustained throughput on a single connection. Comparable to pg8000. Added test_executemany_1000_rows_in_txn benchmark to make this visible. Updated README headline numbers and added a "Performance gotchas" section explaining when autocommit=False matters. Decision: don't pipeline. The remaining 32 us is already excellent; the autocommit gotcha is the real user-facing footgun. Docs > code. If someone reports needing >31K rows/sec single-connection, that becomes Phase 22.
This commit is contained in:
parent
90ce035a00
commit
495128c679
33
CHANGELOG.md
33
CHANGELOG.md
@ -2,6 +2,39 @@
|
|||||||
|
|
||||||
All notable changes to `informix-db`. Versioning is [CalVer](https://calver.org/) — `YYYY.MM.DD` for date-based releases, `YYYY.MM.DD.N` for same-day post-releases per PEP 440.
|
All notable changes to `informix-db`. Versioning is [CalVer](https://calver.org/) — `YYYY.MM.DD` for date-based releases, `YYYY.MM.DD.N` for same-day post-releases per PEP 440.
|
||||||
|
|
||||||
|
## 2026.05.04.6 — `executemany` perf finding: it was the autocommit cliff
|
||||||
|
|
||||||
|
Investigation of the Phase 21 finding that `executemany(N)` cost scaled linearly per-row (1.74 ms × N) regardless of batch size. **Root cause: every autocommit-True INSERT forces a server-side transaction-log flush.** Not a wire-protocol bug.
|
||||||
|
|
||||||
|
### Added
|
||||||
|
|
||||||
|
- **`test_executemany_1000_rows_in_txn`** benchmark — same workload, but inside a single transaction with one COMMIT at the end. Isolates pure protocol cost from server-storage cost.
|
||||||
|
- New module-scoped `txn_conn` fixture in `tests/benchmarks/test_insert_perf.py` for autocommit-False benchmarks.
|
||||||
|
|
||||||
|
### Findings
|
||||||
|
|
||||||
|
| Mode | Total | Per row |
|
||||||
|
|-|-:|-:|
|
||||||
|
| `executemany(1000)` autocommit=True | 1.72 s | 1.72 ms |
|
||||||
|
| `executemany(1000)` in single txn | 32 ms | **32 µs** |
|
||||||
|
|
||||||
|
**53× speedup from changing the transaction boundary, not the driver.** Pure protocol overhead is ~32 µs/row → ~31,000 rows/sec sustained throughput on a single connection. Comparable to mature pure-Python drivers (pg8000).
|
||||||
|
|
||||||
|
### Changed
|
||||||
|
|
||||||
|
- **`tests/benchmarks/README.md`** — updated headline numbers to show both modes, added a "Performance gotchas" section explaining when to use `autocommit=False` for bulk loads.
|
||||||
|
- **`tests/benchmarks/baseline.json`** — refreshed to include the new txn-mode measurement (now 29 entries, was 28).
|
||||||
|
|
||||||
|
### Decision: don't pipeline
|
||||||
|
|
||||||
|
Pipelining BIND+EXECUTE PDUs (writing N without waiting for responses between them) could potentially halve the 32 µs/row figure on loopback. Decided against:
|
||||||
|
|
||||||
|
- The remaining 32 µs is already excellent — single-connection bulk-load performance is not where users hit limits.
|
||||||
|
- Pipelining adds complexity around TCP send-buffer management, partial-failure semantics, and error reporting (which row failed when 50 are in flight).
|
||||||
|
- The autocommit gotcha is the *real* user-facing footgun. Better docs > more code.
|
||||||
|
|
||||||
|
If someone reports needing >31K rows/sec single-connection, this becomes Phase 22 work.
|
||||||
|
|
||||||
## 2026.05.04.5 — Performance benchmarks (Phase 21)
|
## 2026.05.04.5 — Performance benchmarks (Phase 21)
|
||||||
|
|
||||||
Adds `tests/benchmarks/` — a `pytest-benchmark` driven suite covering codec micro-benchmarks (no server required) and end-to-end SELECT/INSERT/pool/async benchmarks. Establishes a committed `baseline.json` so future PRs can be compared against the floor and regressions caught at review.
|
Adds `tests/benchmarks/` — a `pytest-benchmark` driven suite covering codec micro-benchmarks (no server required) and end-to-end SELECT/INSERT/pool/async benchmarks. Establishes a committed `baseline.json` so future PRs can be compared against the floor and regressions caught at review.
|
||||||
|
|||||||
@ -1,6 +1,6 @@
|
|||||||
[project]
|
[project]
|
||||||
name = "informix-db"
|
name = "informix-db"
|
||||||
version = "2026.05.04.5"
|
version = "2026.05.04.6"
|
||||||
description = "Pure-Python driver for IBM Informix IDS — speaks the SQLI wire protocol over raw sockets. No CSDK, no JVM, no native libraries."
|
description = "Pure-Python driver for IBM Informix IDS — speaks the SQLI wire protocol over raw sockets. No CSDK, no JVM, no native libraries."
|
||||||
readme = "README.md"
|
readme = "README.md"
|
||||||
license = { text = "MIT" }
|
license = { text = "MIT" }
|
||||||
|
|||||||
@ -21,22 +21,40 @@ Performance baselines for `informix-db`. Two layers:
|
|||||||
| **Cold connect + close** (login handshake) | **11.2 ms** | **89** |
|
| **Cold connect + close** (login handshake) | **11.2 ms** | **89** |
|
||||||
| 1000-row SELECT * | 1.56 ms | 640 |
|
| 1000-row SELECT * | 1.56 ms | 640 |
|
||||||
| INSERT (single, prepared) | 1.88 ms | 530 |
|
| INSERT (single, prepared) | 1.88 ms | 530 |
|
||||||
| `executemany(100 rows)` | 181 ms | 5.5 (i.e. ~550 rows/sec) |
|
| `executemany(100)` autocommit=True | 181 ms | ~550 rows/sec |
|
||||||
| `executemany(1000 rows)` | 1.74 s | 0.57 (i.e. ~575 rows/sec) |
|
| `executemany(1000)` autocommit=True | 1.72 s | ~580 rows/sec |
|
||||||
|
| **`executemany(1000)` in single transaction** | **32 ms** | **~31,000 rows/sec** |
|
||||||
|
|
||||||
### What these tell you
|
### What these tell you
|
||||||
|
|
||||||
- **Pool gives 72× speedup** over cold connect. If your app opens a
|
- **Pool gives 72× speedup** over cold connect. If your app opens a
|
||||||
connection per request, fix that first.
|
connection per request, fix that first.
|
||||||
|
- **Wrap bulk INSERTs in a transaction.** That's a **53× speedup** over
|
||||||
|
the autocommit-True default. With autocommit on, each row forces the
|
||||||
|
server to flush its transaction log; in transaction mode the flush
|
||||||
|
happens once at COMMIT. Per-row cost drops from 1.72 ms (storage-bound)
|
||||||
|
to 32 µs (pure protocol). PEP 249's default `autocommit=False` was
|
||||||
|
designed for this — we just default to `False`.
|
||||||
- **Codec is not the bottleneck.** Per-row decode (2.9 µs) is 1000× faster
|
- **Codec is not the bottleneck.** Per-row decode (2.9 µs) is 1000× faster
|
||||||
than wire round-trip (177 µs for `SELECT 1`). Network and server-side
|
than wire round-trip (177 µs for `SELECT 1`). Network and server-side
|
||||||
cost dominate.
|
cost dominate.
|
||||||
- **UTF-8 carries no measurable cost.** `decode_varchar_utf8` runs at
|
- **UTF-8 carries no measurable cost.** `decode_varchar_utf8` runs at
|
||||||
216 ns vs `decode_varchar_short` at 170 ns — the 27% delta is the
|
216 ns vs `decode_varchar_short` at 170 ns — the 27% delta is the
|
||||||
multibyte string walk inherent in UTF-8 decoding, not Phase 20 overhead.
|
multibyte string walk inherent in UTF-8 decoding, not Phase 20 overhead.
|
||||||
- **`executemany` doesn't scale linearly.** 100 rows in 181 ms = 1.81 ms/row;
|
|
||||||
1000 rows in 1.74 s = 1.74 ms/row. Suggests per-row cost dominates over
|
### Performance gotchas
|
||||||
PREPARE amortization. Worth investigating in Phase 21.x.
|
|
||||||
|
- **`autocommit=True` + `executemany` is the slowest reasonable pattern.**
|
||||||
|
Use it only when each row genuinely needs to land independently. For
|
||||||
|
bulk loads, default `autocommit=False` and call `conn.commit()` at the
|
||||||
|
end of the batch.
|
||||||
|
- **Single `INSERT` in a tight loop is 1.88 ms each** — strictly worse
|
||||||
|
than `executemany` (which saves PREPARE/RELEASE overhead). If you find
|
||||||
|
yourself looping over `cur.execute("INSERT...")` hundreds of times,
|
||||||
|
switch to `executemany`.
|
||||||
|
- **Cold connect is 11 ms.** The login handshake is *expensive* compared
|
||||||
|
to anything you'll do with the connection. Pool everything in
|
||||||
|
long-lived processes.
|
||||||
|
|
||||||
## Regression policy
|
## Regression policy
|
||||||
|
|
||||||
|
|||||||
File diff suppressed because it is too large
Load Diff
@ -3,19 +3,47 @@
|
|||||||
The single-row vs. executemany delta is the ``executemany`` win — we
|
The single-row vs. executemany delta is the ``executemany`` win — we
|
||||||
PREPARE+RELEASE once and BIND+EXECUTE per row, vs PREPARE+RELEASE per
|
PREPARE+RELEASE once and BIND+EXECUTE per row, vs PREPARE+RELEASE per
|
||||||
row. On any decent network this is 10-50x.
|
row. On any decent network this is 10-50x.
|
||||||
|
|
||||||
|
The autocommit-True vs. autocommit-False delta is the **transaction-flush
|
||||||
|
cost** — every autocommit INSERT forces the server to flush its
|
||||||
|
transaction log per row, drowning out everything else. The benchmark
|
||||||
|
splits these so we can see protocol overhead independently.
|
||||||
"""
|
"""
|
||||||
|
|
||||||
from __future__ import annotations
|
from __future__ import annotations
|
||||||
|
|
||||||
import contextlib
|
import contextlib
|
||||||
|
from collections.abc import Iterator
|
||||||
|
|
||||||
import pytest
|
import pytest
|
||||||
|
|
||||||
import informix_db
|
import informix_db
|
||||||
|
from tests.conftest import ConnParams
|
||||||
|
|
||||||
pytestmark = [pytest.mark.benchmark, pytest.mark.integration]
|
pytestmark = [pytest.mark.benchmark, pytest.mark.integration]
|
||||||
|
|
||||||
|
|
||||||
|
@pytest.fixture(scope="module")
|
||||||
|
def txn_conn(conn_params: ConnParams) -> Iterator[informix_db.Connection]:
|
||||||
|
"""A separate connection with autocommit=False so we can wrap an
|
||||||
|
executemany call in a single explicit transaction. Uses ``testdb``
|
||||||
|
(the logged user DB) — autocommit-off is meaningless on unlogged DBs.
|
||||||
|
"""
|
||||||
|
conn = informix_db.connect(
|
||||||
|
host=conn_params.host,
|
||||||
|
port=conn_params.port,
|
||||||
|
user=conn_params.user,
|
||||||
|
password=conn_params.password,
|
||||||
|
database="testdb",
|
||||||
|
server=conn_params.server,
|
||||||
|
autocommit=False,
|
||||||
|
)
|
||||||
|
try:
|
||||||
|
yield conn
|
||||||
|
finally:
|
||||||
|
conn.close()
|
||||||
|
|
||||||
|
|
||||||
def _setup_temp_table(conn: informix_db.Connection, name: str) -> None:
|
def _setup_temp_table(conn: informix_db.Connection, name: str) -> None:
|
||||||
cur = conn.cursor()
|
cur = conn.cursor()
|
||||||
with contextlib.suppress(informix_db.Error):
|
with contextlib.suppress(informix_db.Error):
|
||||||
@ -82,7 +110,9 @@ def test_executemany_100_rows(
|
|||||||
def test_executemany_1000_rows(
|
def test_executemany_1000_rows(
|
||||||
benchmark, bench_conn: informix_db.Connection
|
benchmark, bench_conn: informix_db.Connection
|
||||||
) -> None:
|
) -> None:
|
||||||
"""1000 INSERTs via executemany — sustained-batch throughput."""
|
"""1000 INSERTs via executemany under autocommit=True — every row
|
||||||
|
forces a transaction-log flush. Worst-case protocol *plus* server
|
||||||
|
storage cost."""
|
||||||
table = "p21_ins_emany_1000"
|
table = "p21_ins_emany_1000"
|
||||||
_setup_temp_table(bench_conn, table)
|
_setup_temp_table(bench_conn, table)
|
||||||
counter = [0]
|
counter = [0]
|
||||||
@ -104,3 +134,37 @@ def test_executemany_1000_rows(
|
|||||||
benchmark.pedantic(run, rounds=3, iterations=1)
|
benchmark.pedantic(run, rounds=3, iterations=1)
|
||||||
finally:
|
finally:
|
||||||
_drop_temp_table(bench_conn, table)
|
_drop_temp_table(bench_conn, table)
|
||||||
|
|
||||||
|
|
||||||
|
def test_executemany_1000_rows_in_txn(
|
||||||
|
benchmark, txn_conn: informix_db.Connection
|
||||||
|
) -> None:
|
||||||
|
"""1000 INSERTs via executemany inside ONE transaction — single
|
||||||
|
log flush at COMMIT time. Isolates the protocol cost from the
|
||||||
|
autocommit-flush cost. The delta vs the autocommit variant is the
|
||||||
|
server-side log-flush penalty (un-fixable from the client side)."""
|
||||||
|
table = "p21_ins_emany_txn"
|
||||||
|
_setup_temp_table(txn_conn, table)
|
||||||
|
txn_conn.commit() # Land the CREATE TABLE before timing
|
||||||
|
counter = [0]
|
||||||
|
|
||||||
|
def run() -> None:
|
||||||
|
counter[0] += 1
|
||||||
|
base = counter[0] * 1000
|
||||||
|
rows = [
|
||||||
|
(base + i, f"row_{base + i}", float(base + i)) for i in range(1000)
|
||||||
|
]
|
||||||
|
cur = txn_conn.cursor()
|
||||||
|
cur.executemany(
|
||||||
|
f"INSERT INTO {table} VALUES (?, ?, ?)",
|
||||||
|
rows,
|
||||||
|
)
|
||||||
|
cur.close()
|
||||||
|
txn_conn.commit()
|
||||||
|
|
||||||
|
try:
|
||||||
|
benchmark.pedantic(run, rounds=3, iterations=1)
|
||||||
|
finally:
|
||||||
|
with contextlib.suppress(informix_db.Error):
|
||||||
|
_drop_temp_table(txn_conn, table)
|
||||||
|
txn_conn.commit()
|
||||||
|
|||||||
Loading…
x
Reference in New Issue
Block a user