informix-db

Author	SHA1	Message	Date
Ryan Malloy	f3e589c5bf	Phase 23: Hot-path optimization for parse_tuple_payload (2026.05.04.8) Per-row decode is hit on every row of every SELECT. The original code had three forms of waste in the inner loop: 1. Redundant base_type() call. ColumnInfo.type_code is already base-typed by parse_describe at construction; calling base_type() again per column per row was pure waste. Single largest savings. 2. IntFlag->int conversions inline (~10x per iteration). Lifted to module-level _TC_X constants. 3. Lazy imports inside the loop body (_decode_datetime, _decode_interval, BlobLocator, ClobLocator, RowValue, CollectionValue). Moved to top. Plus three precomputed frozensets (_LENGTH_PREFIXED_SHORT_TYPES, _COMPOSITE_UDT_TYPES, _NUMERIC_TYPES) replace inline tuple-membership checks. _COLLECTION_KIND_MAP is now MappingProxyType (actually frozen). Performance: * parse_tuple_5cols: 2796 ns -> 2030 ns (-27%) * select_bench_table_all (1k rows): 1477 us -> 1198 us (-19%) * Codec micro-bench, cold connect, executemany: unchanged Real-world fetch ceiling on a single connection: 350K rows/sec -> 490K rows/sec. Margaret Hamilton review surfaced four cleanup items, all addressed before tagging: * H1: cursor._dereference_blob_columns had the same redundant base_type() call - stripped for consistency. * M1: documented the load-bearing invariant at parse_describe (the single producer site) so future contributors have a grep target. * M2: _COLLECTION_KIND_MAP wrapped in MappingProxyType. * L1: stale line-number comment fixed to point at the INVARIANT comment instead. baseline.json refreshed; all 224 integration tests pass; ruff clean.	2026-05-04 17:52:20 -06:00
Ryan Malloy	bea1a1cd0c	Phase 20: UTF-8/multibyte locale support (2026.05.04.4) Thread CLIENT_LOCALE through to user-data string codecs. Driver previously hardcoded iso-8859-1 for all string conversions, which broke any locale outside Western European code points. * Connection.encoding property derived from client_locale via _python_encoding_from_locale (en_US.utf8 -> utf-8, en_US.8859-1 -> iso-8859-1, etc.) * encode_param / decode / parse_tuple_payload accept an encoding parameter; cursor and fast-path call sites forward conn.encoding * Smart-LOB CLOB encode/decode and TEXT decode honor connection encoding * DataError raised for non-representable chars; cursor releases the prepared statement before propagating so connection state stays clean Boundary discipline: protocol-level strings (cursor names, function signatures, SQ_FILE fnames, error near-tokens, SQL text) stay iso-8859-1 (always ASCII, never user-controlled). 9 new integration tests in tests/test_unicode.py covering ASCII round-trip, Latin-1 high-bit, full byte range, locale-mapping, encoding property, UTF-8 negotiation, multibyte (skipped without IFX_UTF8_DATABASE), DataError on non-representable, CLOB round-trip. Total: 69 unit + 212 integration = 281 tests.	2026-05-04 17:13:19 -06:00
Ryan Malloy	9048335462	Phase 12: ROW / COLLECTION type recognition Composite UDTs (ROW=22, COLLECTION=23, SET=19, MULTISET=20, LIST=21) now decode into typed wrapper objects (informix_db.RowValue, informix_db.CollectionValue) that expose schema + raw payload bytes. The wire format is the now-familiar [byte ind][int length][bytes] pattern (same as UDTVAR(lvarchar) from Phase 10). The bytes are a TEXTUAL representation of the value when selected without the extended-binary opt-in JDBC uses: ROW value: b"ROW('Alice',30 )" SET value: b"SET{'red','green','blue'}" LIST value: b"LIST{10 ,20 ,30 }" JDBC's binary-with-schema format runs ~30x larger (1420 bytes for a 2-field ROW vs. our 24). We don't request it — the textual form is what the server returns by default and is sufficient for type recognition. Phase 12 ships type recognition only. Full recursive parsing into Python tuples/lists/sets is deferred to Phase 13 (would require a SQL-literal lexer + recursive type-driven decoding). Production workloads that need typed field access today can project via SQL: cur.execute("SELECT id, r.name, r.age FROM tbl") Tests: 8 integration tests in test_composite_types.py covering ROW recognition, NULL, sub-field projection workaround, long values (>255 bytes — verifies 4-byte length prefix), SET/MULTISET/LIST recognition, and null collections. Total: 64 unit + 134 integration = 198 tests. Lesson reinforced: once one UDT-shaped type is implemented (UDTVAR in Phase 10, smart-LOB in Phase 9), every subsequent UDT-shaped type is mostly a copy of the existing decoder branch. The hard part is payload semantics, not framing.	2026-05-04 14:30:44 -06:00
Ryan Malloy	a9a3cfc38e	Phase 10: smart-LOB BLOB read via SQ_FILE / lotofile Implements end-to-end BLOB reading by leveraging the server's lotofile() function and intercepting the SQ_FILE protocol with in-memory file emulation. Avoids implementing the heavier SQ_FPROUTINE + SQ_LODATA stack initially planned for Phase 10. Strategy: SELECT lotofile(blob_col, '/path', 'client') causes the server to orchestrate a SQ_FILE (98) protocol round-trip — it tells the client to "open file X, write these bytes, close". Our handler buffers the writes in memory keyed by filename instead of touching disk. The bytes appear in cursor.blob_files dict. Wire protocol (per IfxSqli.receiveSQFILE line 4980): * SQ_FILE optype 0 (open): server sends filename + mode/flags/offset * SQ_FILE optype 3 (write): chunked SQ_FILE_WRITE (107) blocks of data, terminated by SQ_EOT. Client responds with total size. * SQ_FILE optype 1 (close): bare SQ_EOT both ways. API: * Low-level: cur.execute("SELECT lotofile(col, '/tmp/X', 'client') ...") followed by cur.blob_files[returned_filename] for the bytes. * High-level: cur.read_blob_column("SELECT col FROM ... WHERE ...", params) returns bytes directly, wrapping the user's SQL with lotofile. Bonus: row decoder now handles UDTVAR (type 40) with extended_name= "lvarchar" — the wire format that lotofile() returns its result as. Format: [byte indicator][int length][bytes]. Tests: 6 integration tests in test_smart_lob_read.py covering low-level + high-level paths, NULL/no-match, multi-chunk (30KB), and validation. Test data seeded via JDBC reference client since smart-LOB writes still need Phase 11. Total: 64 unit + 117 integration = 181 tests. Strategic insight from this phase: don't estimate protocol- implementation cost from JDBC's class hierarchy alone. JDBC's IfxSmBlob is 600+ lines but the wire-level READ path reduces to one SQL function call + one new tag handler. The wire is often simpler than the SDK suggests. Deferred to Phase 11+: * Smart-LOB write (still needs SQ_FPROUTINE + SQ_LODATA) * BlobLocator.read() OO API (requires locator-to-source mapping) * SQ_FILE optype 2 (filetoblob client→server path)	2026-05-04 13:46:18 -06:00
Ryan Malloy	389c32434c	Phase 9: smart-LOB BLOB/CLOB locator decoding (Phase 10 deferred for fetch) SELECT on BLOB or CLOB columns no longer requires raw byte interpretation. The 72-byte server-side locator is wrapped in a typed BlobLocator or ClobLocator (frozen dataclass) so the column is recognizable as "server-side reference, not actual bytes". Wire-protocol findings: * Smart-LOB columns DON'T appear with their nominal type codes (102/101) in SQ_DESCRIBE. They surface as UDTFIXED (41) with extended_id 10 (BLOB) or 11 (CLOB) and encoded_length=72 (locator size). * Retrieving the actual bytes requires SQ_FPROUTINE (103) RPC to invoke ifx_lo_open, plus SQ_LODATA (97) for chunked transfer, plus another SQ_FPROUTINE for ifx_lo_close. That's a Phase 10 lift — roughly 2x the protocol surface of Phase 8. Server config needed (added to Phase 7 setup): * sbspace: onspaces -c -S sbspace1 ... * default sbspace: onmode -wm SBSPACENAME=sbspace1 What ships in Phase 9: * informix_db.BlobLocator(raw: bytes) — 72-byte frozen wrapper * informix_db.ClobLocator(raw: bytes) — distinct type, same shape * Row decoder branch in _resultset.parse_tuple_payload * Wire constants SQ_LODATA=97, SQ_FPROUTINE=103, SQ_FPARAM=104 Tests: * 11 unit tests in test_blob_locator_unit.py (no Informix needed) — construction, immutability, equality, hash, repr safety, size validation. * 4 integration tests in test_smart_lob.py — fixture seeds via JDBC reference client (smart-LOB writes also need deferred protocols). * RefBlob.java helper in tests/reference/ for seeding via JDBC. Total: 64 unit + 111 integration = 175 tests. Locator design note: __repr__ omits the raw bytes (they're opaque to the client). Same-bytes locators of different families compare unequal — BlobLocator(x) != ClobLocator(x) — to avoid silent type confusion.	2026-05-04 13:26:15 -06:00
Ryan Malloy	4dafbf8ce9	Phase 6.d: INTERVAL decoding (both qualifier families) Implements row-decoding for IDS INTERVAL, the last common temporal type. The qualifier short bisects the type at the wire level: start_TU >= DAY maps to datetime.timedelta (day-fraction), start_TU <= MONTH maps to a new informix_db.IntervalYM (year-month). Wire format mirrors DATETIME exactly — `[head byte][digit pairs in base-100]`, with the qualifier dictating field interpretation. The fraction-to-nanoseconds scaling (`scale_exp = 18 - end_TU`, forced odd) is the JDBC pattern from `Decimal.fromIfxToArray`. IntervalYM is a frozen dataclass holding signed total months, with `years` and `remainder_months` as derived properties. Matches JDBC's `IntervalYM.months` shape rather than a (years, months) tuple — avoids ambiguity around what "negative" means for a multi-field tuple. Tests: 13 unit (synthetic byte streams covering all decoder branches) + 9 integration (real Informix queries spanning DAY TO SECOND, HOUR TO SECOND, YEAR TO MONTH, negatives, NULL, and mixed-family rows). Total test count: 53 unit + 82 integration = 135. Encoder for INTERVAL parameter binding is deferred to a later phase (same arc as DECIMAL/DATETIME — decode lands first).	2026-05-04 12:22:07 -06:00
Ryan Malloy	6819dd4cb0	Phase 6.b: DATETIME decoding for all qualifier ranges Before: cur.execute("SELECT CURRENT YEAR TO SECOND ...") cur.fetchone() # → (b'\xc7\x14\x1a\x05\x04...',) raw BCD bytes After: cur.execute("SELECT CURRENT YEAR TO SECOND ...") cur.fetchone() # → (datetime.datetime(2026, 5, 4, 12, 34, 56),) Decoder picks the right Python type by qualifier: YEAR/MONTH/DAY-only → datetime.date HOUR/MIN/SEC-only → datetime.time spans across both → datetime.datetime Wire format (per IfxToJavaDateTime + Decimal.init treating as packed BCD): byte[0] = sign + biased exponent (in base-100 digit pairs) byte[1..] = BCD digit pairs: YYYY (2 bytes) + MM + DD + HH + MI + SS + FFFFF Qualifier extraction from column descriptor: encoded_length = (digit_count << 8) \| (start_TU << 4) \| end_TU TU codes: YEAR=0, MONTH=2, DAY=4, HOUR=6, MIN=8, SEC=10, FRAC1=11..FRAC5=15 Verified against four DATETIME columns of different qualifiers in one tuple — see test_datetime_multiple_columns_in_one_row: YEAR TO SECOND → datetime.datetime(2026, 5, 4, 12, 34, 56) YEAR TO DAY → datetime.date(2026, 5, 4) HOUR TO SECOND → datetime.time(12, 34, 56) YEAR TO FRACTION(3) → datetime.datetime(...) Module changes: src/informix_db/converters.py: + _decode_datetime(raw, encoded_length) — qualifier-driven BCD walk + TU constants (_TU_YEAR, _TU_MONTH, ..., _TU_SECOND) src/informix_db/_resultset.py: + DATETIME row-decoder branch — computes width from digit_count in encoded_length high byte, calls _decode_datetime with the packed qualifier so it can pick the right Python type Tests: 40 unit + 70 integration (7 new DATETIME tests) = 110 total, all green, ruff clean. Tests cover: - YEAR TO SECOND → datetime.datetime - YEAR TO DAY → datetime.date - HOUR TO SECOND → datetime.time - CURRENT YEAR TO FRACTION(3) → datetime.datetime - Mixed qualifiers in one row - DATETIME stored in a real table column (round-trip via SELECT) - NULL DATETIME → Python None DATETIME parameter binding (encoder) is Phase 6.x — same status as DECIMAL encoder.	2026-05-04 12:02:40 -06:00
Ryan Malloy	2bacbc4e53	Phase 6.a: DECIMAL/MONEY row decoding works (COUNT/SUM/AVG return Decimal) Before: cur.execute('SELECT COUNT() FROM systables') cur.fetchone() # → (b'\xc2\x02\x00\x00\x00\x00\x00\x00\x00',) raw bytes After: cur.execute('SELECT COUNT() FROM systables') cur.fetchone() # → (Decimal('276'),) The trickiest decode of the project so far. IDS DECIMAL/MONEY wire format: byte[0] = (sign << 7) \| biased_exponent_base100 bit 7 = sign (1=positive, 0=negative) bits 0-6 = (exponent + 64), XOR'd with 0x7F if negative byte[1..] = digit-pair bytes (each 0..99 = two BCD digits) if negative: asymmetric base-100 complement applied: walk digits right→left, trailing zeros stay zero, first non-zero subtracts from 100, rest from 99 Initial naive "99 - d for all digits" decoder gave artifacts like -1234.559999 instead of -1234.56. The asymmetric complement rule (from Decimal.decComplement line 447) is what makes negatives round-trip exactly. Width on the wire: per-column encoded_length packed as (precision << 8) \| scale; byte width = ceil(precision/2) + 1. parse_tuple_payload uses this to slice DECIMAL columns correctly. Tested cases all decode correctly: COUNT(*) → Decimal('276') SUM(tabid) → Decimal('55') AVG(tabid) → Decimal('5.5') 1234.56::DECIMAL → Decimal('1234.56') -1234.56::DECIMAL → Decimal('-1234.56') -0.5::DECIMAL → Decimal('-0.5') -99.99::DECIMAL → Decimal('-99.99') -12345678.9::DECIMAL → Decimal('-12345678.9') NULL → None Encoder (_encode_decimal) is implemented but disabled — server rejects the produced bytes (precision packing not quite right). Phase 6.x will fix. Workaround: cast Decimal to float, or pass via SQL literal. Module changes: src/informix_db/converters.py: + decimal module import + _decode_decimal — full BCD decoder with asymmetric complement + _encode_decimal (Phase 6.x stub — present but unreached) + DECIMAL/MONEY added to DECODERS dispatch src/informix_db/_resultset.py: + DECIMAL/MONEY width computation from encoded_length Tests: 40 unit + 55 integration (8 new DECIMAL) = 95 total, all green, ruff clean.	2026-05-04 11:17:59 -06:00
Ryan Malloy	34ad04a872	Phase 2.x: VARCHAR row decoding works — three byte-level fixes Three findings, each caught by a different debugging technique, documented in DECISION_LOG.md: 1. CURNAME+NFETCH PDU: trailing reserved field is SHORT not INT. Caught by byte-diffing our 44-byte PDU against JDBC's 42-byte reference under socat. The server tolerated the longer version for INT-only SELECTs (silently consuming extra zeros) but rejected it for VARCHAR queries. Lesson: server tolerance varies by query type — always match JDBC byte-for-byte. 2. SQ_TUPLE payload pads to even byte alignment. An 11-byte "syscolumns" VARCHAR payload had a trailing 0x00 between it and the next SQ_TUPLE tag. JDBC's IfxRowColumn.readTuple consumes this pad silently; we weren't, so any odd-length variable-width row desynced the parser. 3. VARCHAR/NCHAR/NVCHAR in tuple data use a SINGLE-byte length prefix (max 255 chars — IDS VARCHAR's hard limit). NOT a 2-byte short as I'd initially assumed. CHAR is fixed-width per encoded_length. LVARCHAR uses a 4-byte int prefix for >255 byte values. Module changes: src/informix_db/_resultset.py — _LENGTH_PREFIXED_SHORT_TYPES set, branched VARCHAR/NCHAR/NVCHAR (1-byte prefix) vs CHAR (fixed) vs LVARCHAR (4-byte prefix); even-byte alignment pad consumed after each SQ_TUPLE payload. src/informix_db/cursors.py — CURNAME+NFETCH and standalone NFETCH PDUs now write_short(0) for the reserved trailing field. Tests: 40 unit + 18 integration (3 new VARCHAR tests) = 58 total, all green, ruff clean. New tests cover: - VARCHAR single-column SELECT - Odd-length VARCHAR row (regression for the pad-byte bug) - Mixed INT + VARCHAR + FLOAT three-column SELECT Sample output: SELECT FIRST 5 tabname FROM systables → ('systables',), ('syscolumns',), ('sysindices',), ('systabauth',), ('syscolauth',) SELECT FIRST 3 tabname, tabid, nrows → ('systables', 1, 276.0), ... VARCHAR was the last known gap from the Phase 2 commit. Phase 2 now reads INT, BIGINT, REAL, FLOAT, CHAR, VARCHAR end-to-end. Phase 6+ types (DATETIME, INTERVAL, DECIMAL, BLOBs) remain.	2026-05-04 07:55:13 -06:00
Ryan Malloy	a1bd52788d	Phase 2: SELECT works end-to-end — pure-Python Informix fully reads data cursor.execute("SELECT 1 FROM systables WHERE tabid = 1") cursor.fetchone() == (1,) To my knowledge, this is the first time a pure-Python implementation has read data from Informix without wrapping IBM's CSDK or JDBC. Three breakthroughs in this commit: 1. Login PDU's database field is BROKEN. Passing a database name there makes the server reject subsequent SQ_DBOPEN with sqlcode -759 ("database not available"). JDBC always sends NULL in the login PDU's database slot — we now do the same. The user-supplied database opens via SQ_DBOPEN in _init_session. 2. Post-login session init dance: SQ_PROTOCOLS (8-byte feature mask replayed verbatim from JDBC) → SQ_INFO with INFO_ENV + env vars (48-byte PDU replayed verbatim — DBTEMP=/tmp, SUBQCACHESZ=10) → SQ_DBOPEN. Without all three steps in this exact order, the server silently ignores SELECTs. 3. SQ_DESCRIBE per-column block has 10 fields per column (not the simple "name + type" my best-effort parser assumed): fieldIndex, columnStartPos, columnType, columnExtendedId, ownerName, extendedName, reference, alignment, sourceType, encodedLength. The string table at the end is offset-indexed (fieldIndex points into it), which is how JDBC handles disambiguation. Cursor lifecycle implementation in cursors.py mirrors JDBC exactly: PREPARE+NDESCRIBE+WANTDONE → DESCRIBE+DONE+COST+EOT CURNAME+NFETCH(4096) → TUPLE*+DONE+COST+EOT NFETCH(4096) → DONE+COST+EOT (drain) CLOSE → EOT RELEASE → EOT Five round trips per SELECT — same as JDBC. Module changes: src/informix_db/connections.py — added _init_session(), _send_protocols(), _send_dbopen(), _drain_to_eot(), _raise_sq_err(); login PDU now forces database=None always; SQ_INFO PDU replayed verbatim from JDBC capture (offsets-indexed env-var format too gnarly to derive in MVP). src/informix_db/cursors.py — full rewrite: real PDU builders for PREPARE/CURNAME+NFETCH/NFETCH/CLOSE/RELEASE; tag-dispatched response readers; cursor-name generator matching JDBC's "_ifxc" convention. src/informix_db/_resultset.py — proper SQ_DESCRIBE parser per JDBC's receiveDescribe (USVER mode); offset-indexed string table with name lookup by fieldIndex; ColumnInfo dataclass with raw type-code preserved for null-flag extraction. src/informix_db/_messages.py — added SQ_NDESCRIBE=22, SQ_WANTDONE=49. Test coverage: 40 unit + 15 integration tests (7 smoke + 8 new SELECT) = 55 total, all green, ruff clean. New tests cover: - SELECT 1 returns (1,) - cursor.description shape per PEP 249 - Multi-row INT SELECT - Multi-column mixed types (INT + FLOAT) - Iterator protocol (for row in cursor) - fetchmany(n) - Re-executing on same cursor resets state - Two cursors on one connection (sequential) Known gap: VARCHAR row decoding doesn't yet handle the variable-width on-wire encoding correctly. Phase 2.x will address — for now NotImpl errors surface raw bytes in the row tuple.	2026-05-03 15:37:10 -06:00
Ryan Malloy	e2c48f855e	Phase 2 progress: cursor scaffolding + protocol findings (SELECT path WIP) Cursor class scaffolded with full PEP 249 surface: src/informix_db/cursors.py — Cursor with execute, fetchone, fetchmany, fetchall, description, rowcount, arraysize, close, iterator, context manager. Sends SQ_COMMAND chains for parameterless SQL (Phase 4 adds SQ_BIND/SQ_EXECUTE for params). src/informix_db/_resultset.py — ColumnInfo, parse_describe, parse_tuple_payload. Best-effort SQ_DESCRIBE parser; refines in Phase 2.1. src/informix_db/connections.py — Connection.cursor() now returns a real Cursor; new _send_pdu() lets Cursor share the connection's socket without violating encapsulation. Protocol findings landed in PROTOCOL_NOTES.md §6: §6a — SQ_PREPARE format with named tags (the "trailing 22, 49" are SQ_NDESCRIBE and SQ_WANTDONE chained into the same PDU). Confirmed against IfxSqli.sendPrepare line 1062. §6c — Server requires post-login init sequence (SQ_PROTOCOLS → SQ_INFO → SQ_ID(env vars) → SQ_DBOPEN) BEFORE any PREPARE works. Discovered the hard way: PREPARE without this sequence gets no response; SQ_DBOPEN without SQ_PROTOCOLS gets sqlcode=-759 ("Database not available"). The login PDU's database field is a hint, not an open. §6e — SQ_TUPLE corrected: [short warn][int size][bytes payload] (not [int 0][short payloadLen] as earlier draft claimed). Two more constants added to _messages.MessageType: SQ_NDESCRIBE = 22, SQ_WANTDONE = 49 Tests: 40 unit + 7 integration (added 2 new — cursor() returns a Cursor, parameter binding raises NotSupportedError). All green, ruff clean. Removed obsolete "cursor() raises NotImplementedError" test. What works end-to-end now: connect, cursor(), close, parameter-attempt gating. What doesn't yet: cursor.execute("SELECT 1") — server requires the post-login init sequence we don't yet send. Discovered captures (kept for next session's analysis): docs/CAPTURES/06-py-select1-attempt.socat.log docs/CAPTURES/07-py-replay-jdbc-prepare.socat.log docs/CAPTURES/08-py-with-dbopen.socat.log docs/CAPTURES/09-py-full-replay.socat.log Three new tasks created tracking the remaining Phase 2 blockers: post-login init sequence, proper SQ_DESCRIBE parser, SQ_ID action vocabulary helpers.	2026-05-02 21:04:30 -06:00

11 Commits