Adds tests/benchmarks/ with pytest-benchmark coverage of the hot codec paths and end-to-end SELECT/INSERT/pool/async round-trips. Establishes a committed baseline.json so PRs can be regression-checked at review via --benchmark-compare. * test_codec_perf.py (16): decode/encode_param/parse_tuple_payload micro-benchmarks - run without container, suitable for pre-merge CI. * test_select_perf.py (4): SELECT round-trips - 1-row latency floor, 10-row, 1k-row full fetch, parameterized. * test_insert_perf.py (3): single-row INSERT, executemany 100 / 1000. * test_pool_perf.py (3): cold connect, pool acquire/release, pool acquire + query + release. * test_async_perf.py (2): async round-trip overhead, 10x concurrent. * baseline.json: committed snapshot, 28 measurements. * benchmark pytest marker, gated off by default. * Makefile: bench / bench-codec / bench-save targets; test-integration excludes benchmarks for speed. Headline numbers (dev container loopback): * decode(int): 181 ns * parse_tuple 5 cols: 2.87 µs/row * SELECT 1 round-trip: 177 µs * Pool acquire+query+release: 295 µs * Cold connect: 11.2 ms (72x slower than pool) UTF-8 decode carries no measurable cost vs iso-8859-1 - confirms Phase 20 didn't regress anything. Total: 69 unit + 211 integration + 28 benchmark = 308 tests.
210 lines
14 KiB
Markdown
210 lines
14 KiB
Markdown
# Changelog
|
||
|
||
All notable changes to `informix-db`. Versioning is [CalVer](https://calver.org/) — `YYYY.MM.DD` for date-based releases, `YYYY.MM.DD.N` for same-day post-releases per PEP 440.
|
||
|
||
## 2026.05.04.5 — Performance benchmarks (Phase 21)
|
||
|
||
Adds `tests/benchmarks/` — a `pytest-benchmark` driven suite covering codec micro-benchmarks (no server required) and end-to-end SELECT/INSERT/pool/async benchmarks. Establishes a committed `baseline.json` so future PRs can be compared against the floor and regressions caught at review.
|
||
|
||
### Added
|
||
|
||
- **`tests/benchmarks/test_codec_perf.py`** — 16 micro-benchmarks for the hot codec paths (`decode`, `encode_param`, `parse_tuple_payload`). Run without an Informix container; suitable for pre-merge CI.
|
||
- **`tests/benchmarks/test_select_perf.py`** — 4 SELECT round-trip benchmarks: 1-row latency floor, ~10 rows, full 1k-row table, parameterized.
|
||
- **`tests/benchmarks/test_insert_perf.py`** — 3 INSERT benchmarks: single-row, `executemany(100)`, `executemany(1000)`.
|
||
- **`tests/benchmarks/test_pool_perf.py`** — 3 pool benchmarks: cold connect (login handshake cost), pool acquire/release, pool acquire + tiny query + release.
|
||
- **`tests/benchmarks/test_async_perf.py`** — 2 async benchmarks: single async round-trip overhead, 10 concurrent SELECTs through an async pool.
|
||
- **`tests/benchmarks/conftest.py`** — `bench_conn` (long-lived autocommit connection) and `bench_table` (pre-populated 1k-row table) fixtures, both session-scoped.
|
||
- **`tests/benchmarks/baseline.json`** — committed baseline (28 measurements) for `--benchmark-compare` regression checks.
|
||
- **`tests/benchmarks/README.md`** — headline numbers, regression policy, how to update baseline, what each benchmark measures.
|
||
- **`make bench` / `make bench-codec` / `make bench-save`** Makefile targets.
|
||
- **`benchmark` pytest marker** — gated, off by default. `pytest -m benchmark` to opt in.
|
||
|
||
### Changed
|
||
|
||
- **`make test-integration`** now uses `-m "integration and not benchmark"` so the integration suite stays fast (~6s) — benchmarks (~27s) are gated behind `make bench`.
|
||
- **`pytest`** default `-m` now excludes both `integration` and `benchmark`. Default run is unit-only.
|
||
|
||
### Headline numbers (dev container, x86_64 Linux, loopback)
|
||
|
||
| Operation | Mean |
|
||
|-|-:|
|
||
| `decode(int)` (per cell) | 181 ns |
|
||
| `parse_tuple_payload(5 cols)` (per row) | 2.87 µs |
|
||
| `SELECT 1` round-trip | 177 µs |
|
||
| Pool acquire + tiny query + release | 295 µs |
|
||
| **Cold connect + close** | **11.2 ms** |
|
||
|
||
**Pool-vs-cold delta is 72×.** UTF-8 decode carries no measurable cost over iso-8859-1 (Phase 20 didn't slow anything down).
|
||
|
||
### Tests
|
||
|
||
28 new benchmark tests. Total: **69 unit + 211 integration + 28 benchmark = 308**.
|
||
|
||
## 2026.05.04.4 — UTF-8 / multibyte locale support
|
||
|
||
Threads the connection's `CLIENT_LOCALE` through to user-data string codecs so multibyte locales (UTF-8, etc.) round-trip correctly. The driver previously hardcoded `iso-8859-1` for every string conversion — fine for Western European text, broken-by-design for CJK, Cyrillic, Arabic, emoji.
|
||
|
||
### Added
|
||
|
||
- **`Connection.encoding`** property — reports the Python codec name derived from `CLIENT_LOCALE` (e.g., `iso-8859-1`, `utf-8`, `iso-8859-15`). Default for a connection without `client_locale=` is `iso-8859-1` (compatible with the legacy default).
|
||
|
||
- **`informix_db.connections._python_encoding_from_locale(locale: str)`** — maps Informix locale strings (`en_US.utf8`, `en_US.8859-1`, `en_US.819`) to Python codec names. Falls back to `iso-8859-1` for unknown / unsuffixed forms.
|
||
|
||
### Changed
|
||
|
||
- **`encode_param(value, encoding=...)`** and `_encode_str(value, encoding=...)` honor the connection's encoding instead of hardcoded `iso-8859-1`. Cursor's `_emit_bind_params` forwards `self._conn.encoding` per parameter.
|
||
|
||
- **`decode(type_code, raw, encoding=...)`** and `parse_tuple_payload(reader, columns, encoding=...)` thread the encoding to string column decoders (CHAR, VARCHAR, NCHAR, NVCHAR, LVARCHAR). Cursor's `_read_fetch_response` forwards `self._conn.encoding`.
|
||
|
||
- **Smart-LOB CLOB encode/decode** (`write_blob_column`, simple-LOB TEXT fetch) honor `self._conn.encoding`.
|
||
|
||
- **Fast-path RPC** (`Connection.fast_path_call`) honors `self._encoding` for its bound parameters.
|
||
|
||
### Boundary discipline
|
||
|
||
Protocol-level strings stay `iso-8859-1` (always ASCII, never user-controlled): cursor names, function signatures, server-fabricated SQ_FILE virtual filenames, error "near tokens", SQL keywords/identifiers. Only user-data strings (column values, parameter binds) follow `CLIENT_LOCALE`.
|
||
|
||
### Error handling
|
||
|
||
Encoding-can't-represent-this-value (e.g., `"你好"` on an `8859-1` connection) now raises `informix_db.DataError` instead of letting Python's `UnicodeEncodeError` leak. The cursor releases the prepared statement before propagating, so the connection survives cleanly for the next query.
|
||
|
||
### Tests
|
||
|
||
9 new integration tests in `tests/test_unicode.py`:
|
||
- ASCII round-trip (regression)
|
||
- Latin-1 high-bit chars round-trip on default locale
|
||
- Full byte range 0x20-0xFE round-trip via VARCHAR
|
||
- Locale → Python codec mapping for common forms
|
||
- `Connection.encoding` exposes the resolved codec
|
||
- UTF-8 locale negotiation (server transcodes for ASCII even with 8859-1 DB)
|
||
- UTF-8 multibyte round-trip (skipped without `IFX_UTF8_DATABASE` env var pointing to a UTF-8 database)
|
||
- Non-representable char raises `DataError` cleanly; connection survives
|
||
- CLOB column round-trips Latin-1 text honoring connection encoding
|
||
|
||
Total: **69 unit + 212 integration = 281 tests**.
|
||
|
||
### Limitations
|
||
|
||
- Multibyte UTF-8 storage requires both `client_locale='en_US.utf8'` AND a database whose `DB_LOCALE` is UTF-8. The dev container's `testdb` is `8859-1`, so storing CJK chars there will continue to fail server-side regardless of the client codec. The `test_utf8_multibyte_round_trip` test is gated on the `IFX_UTF8_DATABASE` env var pointing to a UTF-8 database.
|
||
|
||
## 2026.05.04.3 — Resilience tests (fault injection)
|
||
|
||
### Added
|
||
|
||
- **`tests/_proxy.py`** — `ControlledProxy` helper: a thread-based TCP forwarder between the test client and Informix, with a `kill()` method that sends TCP RST (via `SO_LINGER=0`) to simulate a network drop or server crash. Used as a context manager.
|
||
|
||
- **`tests/test_resilience.py`** — 12 integration tests filling the resilience gap identified in the test-coverage audit:
|
||
- Network drop mid-SELECT raises `OperationalError` cleanly (not hang)
|
||
- Network drop after describe but before fetch
|
||
- Network drop during fetch iteration (already-materialized rows still readable, fresh execute fails)
|
||
- Local socket close (yank-the-rug from client side)
|
||
- I/O error marks connection unusable
|
||
- Pool evicts a connection that died mid-`with` block
|
||
- Pool revives after all idle connections died (health-check on acquire mints fresh)
|
||
- Async cancellation via `asyncio.wait_for` — pool stays usable for subsequent queries
|
||
- Cursor reusable after SQL error
|
||
- Connection survives cursor close after error
|
||
- Pool sustained-load smoke (50 acquire/release cycles, no leak)
|
||
- `read_timeout` fires on a hung connection
|
||
|
||
### What this catches
|
||
|
||
- **Hangs** (waiting forever on a dead socket)
|
||
- **Silent data corruption** (treating EOF as a valid tuple)
|
||
- **Double-fault** (one error → cleanup raises a different error)
|
||
- **Pool poisoning** (returning a broken connection to the pool)
|
||
- **Stale cursor reuse** (same cursor reused across an error boundary)
|
||
|
||
### Tests
|
||
|
||
12 new integration tests. Total: **69 unit + 203 integration = 272 tests**.
|
||
|
||
The Phase 19 work fills the highest-priority gap from the test-adequacy audit. Remaining gaps from that audit (UTF-8 locale, server-version matrix, performance benchmarks) are real but lower-severity.
|
||
|
||
## 2026.05.04.2 — Server-side scrollable cursors
|
||
|
||
### Added
|
||
|
||
- **Server-side scrollable cursors** (Phase 18): opt in via `conn.cursor(scrollable=True)`. The cursor opens with `SQ_SCROLL` (24) before `SQ_OPEN` (6), the result set stays materialized server-side, and each scroll method sends `SQ_SFETCH` (23) to fetch one row at a time. Use this for huge result sets where in-memory materialization would be wasteful.
|
||
|
||
The user-facing API is identical to Phase 17's in-memory scroll (`fetch_first`, `fetch_last`, `fetch_prior`, `fetch_absolute`, `fetch_relative`, `scroll`, `rownumber`); only the internal mechanism differs:
|
||
|
||
| | Default cursor | `scrollable=True` |
|
||
|---|---|---|
|
||
| Memory | All rows materialized | One row at a time |
|
||
| Network round-trips per fetch | 0 (after initial NFETCH) | 1 (one SFETCH per call) |
|
||
| Cursor lifetime | Closed after `execute()` | Open until `close()` |
|
||
| Best for | Moderate result sets, sequential iteration | Huge result sets, random access |
|
||
|
||
Implementation discovers total row count lazily via SFETCH(LAST=4) when negative absolute indexing requires it; result is cached in `_scroll_total_rows`. Position tracking is authoritative from the server's `SQ_TUPID` (25) tag, not client-computed.
|
||
|
||
### Wire-protocol details
|
||
|
||
- `SQ_SFETCH` (23): `[short SQ_ID=4][int 23][short scrolltype][int target][int bufSize=4096][short SQ_EOT]`. scrolltype values: 1=NEXT, 4=LAST, 6=ABSOLUTE.
|
||
- `SQ_SCROLL` (24): emitted between CURNAME and SQ_OPEN to mark the cursor as scrollable.
|
||
- `SQ_TUPID` (25): server response carrying the 1-indexed row position the server just delivered. `[short 25][int rowID]`.
|
||
|
||
The trap on the way: I initially used SHORT for `bufSize` and the server hung silently — same SHORT-vs-INT diagnostic pattern as Phase 4.x's CURNAME+NFETCH. Captured a JDBC trace, byte-diffed against ours, found the mismatch.
|
||
|
||
### Tests
|
||
|
||
14 new integration tests in `test_scroll_cursor_server.py`. Total: **69 unit + 191 integration = 260 tests**.
|
||
|
||
## 2026.05.04.1 — Scroll cursors
|
||
|
||
### Added
|
||
|
||
- **Scroll cursor API** on `Cursor` (Phase 17):
|
||
- `cur.scroll(value, mode='relative'|'absolute')` — PEP 249 compatible
|
||
- `cur.fetch_first()` / `cur.fetch_last()` — jump to ends
|
||
- `cur.fetch_prior()` — backward step (SQL-standard semantics: from past-end yields the last row)
|
||
- `cur.fetch_absolute(n)` — 0-indexed jump; negative `n` indexes from the end
|
||
- `cur.fetch_relative(n)` — n-step from current position
|
||
- `cur.rownumber` — current 0-indexed position (None if before-first or no result set)
|
||
|
||
In-memory implementation — no new wire-protocol; the existing materialized result set in `cur._rows` is now indexed rather than iterated. For server-side scroll over huge result sets, `SQ_SFETCH` (tag 23) would be needed — Phase 18 if anyone hits the in-memory ceiling.
|
||
|
||
### Tests
|
||
|
||
14 new integration tests in `test_scroll_cursor.py`. Total: **69 unit + 177 integration = 246 tests**.
|
||
|
||
## 2026.05.04 — Library completion
|
||
|
||
The Phase 0 ambition — first pure-Python Informix SQLI driver — reaches feature completeness. Adds async, TLS, connection pool, smart-LOBs, fast-path RPC, composite UDTs.
|
||
|
||
### Added
|
||
|
||
- **Async API** (`informix_db.aio`) — `AsyncConnection`, `AsyncCursor`, `AsyncConnectionPool` for FastAPI / aiohttp / asyncio. Each blocking I/O call is offloaded to a worker thread via `asyncio.to_thread`; event loop never blocks.
|
||
- **Connection pool** (`informix_db.create_pool`) — thread-safe with min/max sizing, lazy growth, health-check on acquire, error-aware eviction.
|
||
- **TLS** — `tls=True` for self-signed dev servers, `tls=ssl.SSLContext` for production. Wrapping happens in `IfxSocket` so the rest of the protocol layer is unaware.
|
||
- **Smart-LOBs** (BLOB / CLOB) — full read/write end-to-end via `cursor.read_blob_column()` / `cursor.write_blob_column()` using the server's `lotofile` / `filetoblob` SQL functions intercepted at the `SQ_FILE` (98) protocol level.
|
||
- **Legacy in-row blobs** (BYTE / TEXT) — bind + read via the `SQ_BBIND` / `SQ_BLOB` / `SQ_FETCHBLOB` protocol family.
|
||
- **Fast-path RPC** (`Connection.fast_path_call`) — direct stored-procedure invocation bypassing PREPARE/EXECUTE; routine handles cached per-connection.
|
||
- **Composite UDT recognition** — `ROW`, `SET`, `MULTISET`, `LIST` columns return typed `RowValue` / `CollectionValue` wrappers exposing schema and raw bytes.
|
||
- **Type codecs** — `INTERVAL` (both DAY-TO-FRACTION and YEAR-TO-MONTH families), `DATETIME` (all qualifier ranges), `DECIMAL` / `MONEY` (BCD with sign+exp head byte and asymmetric base-100 complement for negatives), `DATE`, `BOOL`, all integer / float widths, `CHAR` / `VARCHAR` / `LVARCHAR`.
|
||
- **Transactions** — implicit `SQ_BEGIN` before each transaction in non-ANSI logged DBs; transparent no-ops on unlogged DBs.
|
||
- **PEP 249 exception hierarchy** — server `SQLCODE` mapped to the right exception class (`IntegrityError` for duplicate-key violations, `ProgrammingError` for syntax errors, etc.).
|
||
|
||
### Documentation
|
||
|
||
- [`README.md`](README.md) — overview and quick-start
|
||
- [`docs/USAGE.md`](docs/USAGE.md) — practical recipes and migration guide
|
||
- [`docs/PROTOCOL_NOTES.md`](docs/PROTOCOL_NOTES.md) — byte-level wire-format reference
|
||
- [`docs/DECISION_LOG.md`](docs/DECISION_LOG.md) — phase-by-phase architectural decisions, with the *why* preserved
|
||
- [`docs/JDBC_NOTES.md`](docs/JDBC_NOTES.md) — index into the decompiled IBM JDBC reference
|
||
- [`docs/CAPTURES/`](docs/CAPTURES/) — annotated socat hex-dump captures
|
||
|
||
### Test coverage
|
||
|
||
232 tests total: **69 unit + 163 integration**. Unit tests run with no external dependencies; integration tests run against the IBM Informix Developer Edition Docker image.
|
||
|
||
### Known gaps (deferred)
|
||
|
||
- **Full ROW/COLLECTION recursive parsing**: Phase 12 ships type recognition + raw-bytes wrapper. Parsing the textual representation into typed Python tuples/sets/lists is deferred — most workloads can use SQL projections (`SELECT row_col.fieldname FROM tbl`) instead.
|
||
- **UDT parameter encoding for fast-path**: scalar params/returns work; passing a 72-byte BLOB locator as a UDT param requires extending the SQ_BIND encoder with the extended_owner/extended_name preamble for type > 18.
|
||
- **Native async I/O**: Phase 16 ships a thread-pool wrapper that's functionally equivalent for typical FastAPI workloads. Native async (asyncpg-style transport abstraction) would be Phase 17 if a real workload needs it.
|
||
|
||
## 2026.05.02 — Phase 1: connection lifecycle
|
||
|
||
Initial release. `connect()` / `close()` works end-to-end. Cursor / execute / fetch arrived in Phase 2 (subsequent commits within the same session).
|