Thread CLIENT_LOCALE through to user-data string codecs. Driver previously hardcoded iso-8859-1 for all string conversions, which broke any locale outside Western European code points. * Connection.encoding property derived from client_locale via _python_encoding_from_locale (en_US.utf8 -> utf-8, en_US.8859-1 -> iso-8859-1, etc.) * encode_param / decode / parse_tuple_payload accept an encoding parameter; cursor and fast-path call sites forward conn.encoding * Smart-LOB CLOB encode/decode and TEXT decode honor connection encoding * DataError raised for non-representable chars; cursor releases the prepared statement before propagating so connection state stays clean Boundary discipline: protocol-level strings (cursor names, function signatures, SQ_FILE fnames, error near-tokens, SQL text) stay iso-8859-1 (always ASCII, never user-controlled). 9 new integration tests in tests/test_unicode.py covering ASCII round-trip, Latin-1 high-bit, full byte range, locale-mapping, encoding property, UTF-8 negotiation, multibyte (skipped without IFX_UTF8_DATABASE), DataError on non-representable, CLOB round-trip. Total: 69 unit + 212 integration = 281 tests.
12 KiB
Changelog
All notable changes to informix-db. Versioning is CalVer — YYYY.MM.DD for date-based releases, YYYY.MM.DD.N for same-day post-releases per PEP 440.
2026.05.04.4 — UTF-8 / multibyte locale support
Threads the connection's CLIENT_LOCALE through to user-data string codecs so multibyte locales (UTF-8, etc.) round-trip correctly. The driver previously hardcoded iso-8859-1 for every string conversion — fine for Western European text, broken-by-design for CJK, Cyrillic, Arabic, emoji.
Added
-
Connection.encodingproperty — reports the Python codec name derived fromCLIENT_LOCALE(e.g.,iso-8859-1,utf-8,iso-8859-15). Default for a connection withoutclient_locale=isiso-8859-1(compatible with the legacy default). -
informix_db.connections._python_encoding_from_locale(locale: str)— maps Informix locale strings (en_US.utf8,en_US.8859-1,en_US.819) to Python codec names. Falls back toiso-8859-1for unknown / unsuffixed forms.
Changed
-
encode_param(value, encoding=...)and_encode_str(value, encoding=...)honor the connection's encoding instead of hardcodediso-8859-1. Cursor's_emit_bind_paramsforwardsself._conn.encodingper parameter. -
decode(type_code, raw, encoding=...)andparse_tuple_payload(reader, columns, encoding=...)thread the encoding to string column decoders (CHAR, VARCHAR, NCHAR, NVCHAR, LVARCHAR). Cursor's_read_fetch_responseforwardsself._conn.encoding. -
Smart-LOB CLOB encode/decode (
write_blob_column, simple-LOB TEXT fetch) honorself._conn.encoding. -
Fast-path RPC (
Connection.fast_path_call) honorsself._encodingfor its bound parameters.
Boundary discipline
Protocol-level strings stay iso-8859-1 (always ASCII, never user-controlled): cursor names, function signatures, server-fabricated SQ_FILE virtual filenames, error "near tokens", SQL keywords/identifiers. Only user-data strings (column values, parameter binds) follow CLIENT_LOCALE.
Error handling
Encoding-can't-represent-this-value (e.g., "你好" on an 8859-1 connection) now raises informix_db.DataError instead of letting Python's UnicodeEncodeError leak. The cursor releases the prepared statement before propagating, so the connection survives cleanly for the next query.
Tests
9 new integration tests in tests/test_unicode.py:
- ASCII round-trip (regression)
- Latin-1 high-bit chars round-trip on default locale
- Full byte range 0x20-0xFE round-trip via VARCHAR
- Locale → Python codec mapping for common forms
Connection.encodingexposes the resolved codec- UTF-8 locale negotiation (server transcodes for ASCII even with 8859-1 DB)
- UTF-8 multibyte round-trip (skipped without
IFX_UTF8_DATABASEenv var pointing to a UTF-8 database) - Non-representable char raises
DataErrorcleanly; connection survives - CLOB column round-trips Latin-1 text honoring connection encoding
Total: 69 unit + 212 integration = 281 tests.
Limitations
- Multibyte UTF-8 storage requires both
client_locale='en_US.utf8'AND a database whoseDB_LOCALEis UTF-8. The dev container'stestdbis8859-1, so storing CJK chars there will continue to fail server-side regardless of the client codec. Thetest_utf8_multibyte_round_triptest is gated on theIFX_UTF8_DATABASEenv var pointing to a UTF-8 database.
2026.05.04.3 — Resilience tests (fault injection)
Added
-
tests/_proxy.py—ControlledProxyhelper: a thread-based TCP forwarder between the test client and Informix, with akill()method that sends TCP RST (viaSO_LINGER=0) to simulate a network drop or server crash. Used as a context manager. -
tests/test_resilience.py— 12 integration tests filling the resilience gap identified in the test-coverage audit:- Network drop mid-SELECT raises
OperationalErrorcleanly (not hang) - Network drop after describe but before fetch
- Network drop during fetch iteration (already-materialized rows still readable, fresh execute fails)
- Local socket close (yank-the-rug from client side)
- I/O error marks connection unusable
- Pool evicts a connection that died mid-
withblock - Pool revives after all idle connections died (health-check on acquire mints fresh)
- Async cancellation via
asyncio.wait_for— pool stays usable for subsequent queries - Cursor reusable after SQL error
- Connection survives cursor close after error
- Pool sustained-load smoke (50 acquire/release cycles, no leak)
read_timeoutfires on a hung connection
- Network drop mid-SELECT raises
What this catches
- Hangs (waiting forever on a dead socket)
- Silent data corruption (treating EOF as a valid tuple)
- Double-fault (one error → cleanup raises a different error)
- Pool poisoning (returning a broken connection to the pool)
- Stale cursor reuse (same cursor reused across an error boundary)
Tests
12 new integration tests. Total: 69 unit + 203 integration = 272 tests.
The Phase 19 work fills the highest-priority gap from the test-adequacy audit. Remaining gaps from that audit (UTF-8 locale, server-version matrix, performance benchmarks) are real but lower-severity.
2026.05.04.2 — Server-side scrollable cursors
Added
-
Server-side scrollable cursors (Phase 18): opt in via
conn.cursor(scrollable=True). The cursor opens withSQ_SCROLL(24) beforeSQ_OPEN(6), the result set stays materialized server-side, and each scroll method sendsSQ_SFETCH(23) to fetch one row at a time. Use this for huge result sets where in-memory materialization would be wasteful.The user-facing API is identical to Phase 17's in-memory scroll (
fetch_first,fetch_last,fetch_prior,fetch_absolute,fetch_relative,scroll,rownumber); only the internal mechanism differs:Default cursor scrollable=TrueMemory All rows materialized One row at a time Network round-trips per fetch 0 (after initial NFETCH) 1 (one SFETCH per call) Cursor lifetime Closed after execute()Open until close()Best for Moderate result sets, sequential iteration Huge result sets, random access Implementation discovers total row count lazily via SFETCH(LAST=4) when negative absolute indexing requires it; result is cached in
_scroll_total_rows. Position tracking is authoritative from the server'sSQ_TUPID(25) tag, not client-computed.
Wire-protocol details
SQ_SFETCH(23):[short SQ_ID=4][int 23][short scrolltype][int target][int bufSize=4096][short SQ_EOT]. scrolltype values: 1=NEXT, 4=LAST, 6=ABSOLUTE.SQ_SCROLL(24): emitted between CURNAME and SQ_OPEN to mark the cursor as scrollable.SQ_TUPID(25): server response carrying the 1-indexed row position the server just delivered.[short 25][int rowID].
The trap on the way: I initially used SHORT for bufSize and the server hung silently — same SHORT-vs-INT diagnostic pattern as Phase 4.x's CURNAME+NFETCH. Captured a JDBC trace, byte-diffed against ours, found the mismatch.
Tests
14 new integration tests in test_scroll_cursor_server.py. Total: 69 unit + 191 integration = 260 tests.
2026.05.04.1 — Scroll cursors
Added
-
Scroll cursor API on
Cursor(Phase 17):cur.scroll(value, mode='relative'|'absolute')— PEP 249 compatiblecur.fetch_first()/cur.fetch_last()— jump to endscur.fetch_prior()— backward step (SQL-standard semantics: from past-end yields the last row)cur.fetch_absolute(n)— 0-indexed jump; negativenindexes from the endcur.fetch_relative(n)— n-step from current positioncur.rownumber— current 0-indexed position (None if before-first or no result set)
In-memory implementation — no new wire-protocol; the existing materialized result set in
cur._rowsis now indexed rather than iterated. For server-side scroll over huge result sets,SQ_SFETCH(tag 23) would be needed — Phase 18 if anyone hits the in-memory ceiling.
Tests
14 new integration tests in test_scroll_cursor.py. Total: 69 unit + 177 integration = 246 tests.
2026.05.04 — Library completion
The Phase 0 ambition — first pure-Python Informix SQLI driver — reaches feature completeness. Adds async, TLS, connection pool, smart-LOBs, fast-path RPC, composite UDTs.
Added
- Async API (
informix_db.aio) —AsyncConnection,AsyncCursor,AsyncConnectionPoolfor FastAPI / aiohttp / asyncio. Each blocking I/O call is offloaded to a worker thread viaasyncio.to_thread; event loop never blocks. - Connection pool (
informix_db.create_pool) — thread-safe with min/max sizing, lazy growth, health-check on acquire, error-aware eviction. - TLS —
tls=Truefor self-signed dev servers,tls=ssl.SSLContextfor production. Wrapping happens inIfxSocketso the rest of the protocol layer is unaware. - Smart-LOBs (BLOB / CLOB) — full read/write end-to-end via
cursor.read_blob_column()/cursor.write_blob_column()using the server'slotofile/filetoblobSQL functions intercepted at theSQ_FILE(98) protocol level. - Legacy in-row blobs (BYTE / TEXT) — bind + read via the
SQ_BBIND/SQ_BLOB/SQ_FETCHBLOBprotocol family. - Fast-path RPC (
Connection.fast_path_call) — direct stored-procedure invocation bypassing PREPARE/EXECUTE; routine handles cached per-connection. - Composite UDT recognition —
ROW,SET,MULTISET,LISTcolumns return typedRowValue/CollectionValuewrappers exposing schema and raw bytes. - Type codecs —
INTERVAL(both DAY-TO-FRACTION and YEAR-TO-MONTH families),DATETIME(all qualifier ranges),DECIMAL/MONEY(BCD with sign+exp head byte and asymmetric base-100 complement for negatives),DATE,BOOL, all integer / float widths,CHAR/VARCHAR/LVARCHAR. - Transactions — implicit
SQ_BEGINbefore each transaction in non-ANSI logged DBs; transparent no-ops on unlogged DBs. - PEP 249 exception hierarchy — server
SQLCODEmapped to the right exception class (IntegrityErrorfor duplicate-key violations,ProgrammingErrorfor syntax errors, etc.).
Documentation
README.md— overview and quick-startdocs/USAGE.md— practical recipes and migration guidedocs/PROTOCOL_NOTES.md— byte-level wire-format referencedocs/DECISION_LOG.md— phase-by-phase architectural decisions, with the why preserveddocs/JDBC_NOTES.md— index into the decompiled IBM JDBC referencedocs/CAPTURES/— annotated socat hex-dump captures
Test coverage
232 tests total: 69 unit + 163 integration. Unit tests run with no external dependencies; integration tests run against the IBM Informix Developer Edition Docker image.
Known gaps (deferred)
- Full ROW/COLLECTION recursive parsing: Phase 12 ships type recognition + raw-bytes wrapper. Parsing the textual representation into typed Python tuples/sets/lists is deferred — most workloads can use SQL projections (
SELECT row_col.fieldname FROM tbl) instead. - UDT parameter encoding for fast-path: scalar params/returns work; passing a 72-byte BLOB locator as a UDT param requires extending the SQ_BIND encoder with the extended_owner/extended_name preamble for type > 18.
- Native async I/O: Phase 16 ships a thread-pool wrapper that's functionally equivalent for typical FastAPI workloads. Native async (asyncpg-style transport abstraction) would be Phase 17 if a real workload needs it.
2026.05.02 — Phase 1: connection lifecycle
Initial release. connect() / close() works end-to-end. Cursor / execute / fetch arrived in Phase 2 (subsequent commits within the same session).