7 Commits

Author SHA1 Message Date
a42dc5c5de Phase 18: server-side scrollable cursors via SQ_SFETCH (v2026.05.04.2)
Opt-in via conn.cursor(scrollable=True). Opens the cursor with
SQ_SCROLL (24) before SQ_OPEN (6), keeps it open server-side, and
sends SQ_SFETCH (23) per scroll call instead of materializing the
result set up-front.

User-facing API is identical to Phase 17's in-memory scroll
(fetch_first/last/prior/absolute/relative, scroll, rownumber).
Only the internal mechanism differs:

  | feature           | default          | scrollable=True
  |-------------------|------------------|------------------
  | memory            | all rows         | one row at a time
  | round-trips/fetch | 0 (after NFETCH) | 1 per call
  | cursor lifetime   | closed after exec| open until close()
  | best for          | sequential iter  | random access on
                                         | huge result sets

Wire format (verified against JDBC ScrollProbe capture):
* SQ_SFETCH: [short SQ_ID=4][int 23][short scrolltype]
  [int target][int bufSize=4096][short SQ_EOT]
  scrolltype: 1=NEXT, 4=LAST, 6=ABSOLUTE
* SQ_SCROLL (24): emitted between CURNAME and SQ_OPEN
* SQ_TUPID (25): response tag with 1-indexed row position;
  authoritative source for client-side position tracking

Position tracking uses the server's SQ_TUPID rather than client-
computed indexes. Total row count discovered lazily via SFETCH(LAST)
when negative absolute indexing requires it; cached in
_scroll_total_rows.

Trap on the way: initial SFETCH used SHORT for bufSize → server
hung silently. Same SHORT-vs-INT diagnostic pattern as Phase 4.x's
CURNAME+NFETCH. Captured JDBC trace, byte-diffed against ours,
found the mismatch (bufSize is INT in modern Informix per
isXPSVER8_40 / is2GBFetchBufferSupported).

Tests: 14 integration tests in test_scroll_cursor_server.py
covering lifecycle, sequential fetch, fetch_first/last/prior/
absolute/relative, negative indexing, scroll, empty result sets,
past-end, and random-access on a 100-row result set.

Total: 69 unit + 191 integration = 260 tests.
2026-05-04 16:41:25 -06:00
fa3ab751f9 Phase 13: SQ_FPROUTINE / SQ_EXFPROUTINE fast-path RPC
Implements direct stored-procedure invocation via the parallel
fast-path protocol family. Three new wire messages:
* SQ_GETROUTINE (101) — handle resolution by signature
* SQ_EXFPROUTINE (102) — execute by handle with bound params
* SQ_FPROUTINE (103) — response with return values

API: Connection.fast_path_call(signature, *params) -> list

Routine handles cached per-connection in a dict[signature -> (db_name,
handle)] — first call resolves and caches, subsequent calls skip
GETROUTINE.

Why this matters even though Phase 10/11 already do most smart-LOB
work via SQL: ifx_lo_close(int) can't be invoked via "EXECUTE FUNCTION"
(returns -674). Without the fast-path, opened locators leak server-side
until the session ends. The fast-path also enables tighter UDF-in-loop
workloads — no PREPARE→DESCRIBE→EXECUTE overhead, just GETROUTINE+
EXFPROUTINE (one round-trip after caching).

Wire format examples (verified against JDBC):
* GETROUTINE request:
    [short 101][byte 0][int sigLen][sig bytes][pad if odd]
    [short 0][short SQ_EOT]
* EXFPROUTINE request:
    [short 102][short dbNameLen][dbName][pad if odd][int handle]
    [short paramCount][short fparamFlag][SQ_BIND data][short SQ_EOT]
* FPROUTINE response:
    [short numReturns][per-return: type/UDT-info/ind/prec/data]
    + drain SQ_DONE/SQ_COST/SQ_XACTSTAT until SQ_EOT

MVP scope:
* Scalar params/returns only (int/float/str/bool/None/etc.)
* UDT params (e.g., 72-byte BLOB locator) deferred to Phase 13.x
* SQ_LODATA chunked I/O deferred — Phase 10/11 already cover read/write

Tests: 5 integration tests covering error paths, success paths,
handle caching, and multiple cycles.

Total: 64 unit + 139 integration = 203 tests.

Architectural milestone: with Phase 13 complete, the project now
covers every wire-message family JDBC uses for ordinary database
work. Only TLS handshake and cluster-redirect (replication failover)
remain unimplemented — neither is needed for a single-instance
driver.
2026-05-04 14:36:21 -06:00
389c32434c Phase 9: smart-LOB BLOB/CLOB locator decoding (Phase 10 deferred for fetch)
SELECT on BLOB or CLOB columns no longer requires raw byte interpretation.
The 72-byte server-side locator is wrapped in a typed BlobLocator or
ClobLocator (frozen dataclass) so the column is recognizable as
"server-side reference, not actual bytes".

Wire-protocol findings:
* Smart-LOB columns DON'T appear with their nominal type codes (102/101)
  in SQ_DESCRIBE. They surface as UDTFIXED (41) with extended_id 10
  (BLOB) or 11 (CLOB) and encoded_length=72 (locator size).
* Retrieving the actual bytes requires SQ_FPROUTINE (103) RPC to
  invoke ifx_lo_open, plus SQ_LODATA (97) for chunked transfer, plus
  another SQ_FPROUTINE for ifx_lo_close. That's a Phase 10 lift —
  roughly 2x the protocol surface of Phase 8.

Server config needed (added to Phase 7 setup):
* sbspace: onspaces -c -S sbspace1 ...
* default sbspace: onmode -wm SBSPACENAME=sbspace1

What ships in Phase 9:
* informix_db.BlobLocator(raw: bytes) — 72-byte frozen wrapper
* informix_db.ClobLocator(raw: bytes) — distinct type, same shape
* Row decoder branch in _resultset.parse_tuple_payload
* Wire constants SQ_LODATA=97, SQ_FPROUTINE=103, SQ_FPARAM=104

Tests:
* 11 unit tests in test_blob_locator_unit.py (no Informix needed) —
  construction, immutability, equality, hash, repr safety, size
  validation.
* 4 integration tests in test_smart_lob.py — fixture seeds via JDBC
  reference client (smart-LOB writes also need deferred protocols).
* RefBlob.java helper in tests/reference/ for seeding via JDBC.

Total: 64 unit + 111 integration = 175 tests.

Locator design note: __repr__ omits the raw bytes (they're opaque to
the client). Same-bytes locators of different families compare
unequal — BlobLocator(x) != ClobLocator(x) — to avoid silent type
confusion.
2026-05-04 13:26:15 -06:00
1c19c71cb6 Phase 7: real transaction semantics on logged databases
Introduces driver-managed transactions that work seamlessly across
logged and unlogged databases. The user calls commit() and rollback()
without needing to know which kind they're hitting — the connection
tracks transaction state internally.

Three protocol facts came out of integration testing:

1. Logged DBs in non-ANSI mode require an explicit SQ_BEGIN before
   the first DML — the server doesn't auto-open a transaction.
   Connection._ensure_transaction() sends SQ_BEGIN lazily and is
   idempotent within an open txn. After commit/rollback, the next
   DML triggers a fresh BEGIN.

2. SQ_RBWORK has a [short savepoint=0] payload before the SQ_EOT
   framing tag — sending SQ_RBWORK alone causes the server to hang
   silently (waiting for the missing 2 bytes). SQ_CMMTWORK has no
   payload. This is the same pattern as the SHORT-vs-INT bug from
   Phase 4.x and the 2-byte length prefix from Phase 6.c — when the
   server hangs, it's an incomplete PDU body.

3. SQ_XACTSTAT (tag 99) is a logged-DB-only message that's
   interleaved with normal responses. Now drained in all four
   response-reading paths: cursor _drain_to_eot, _read_describe_
   response, _read_fetch_response, and connection _drain_to_eot.

For unlogged DBs (e.g., sysmaster), SQ_BEGIN returns -201 and we
cache that result so subsequent DML doesn't re-probe. commit() and
rollback() are silent no-ops in that case — same client code works
across both DB modes.

Tests:
* New tests/test_transactions.py — 10 integration tests covering
  commit visibility, rollback isolation, multi-row rollback, partial
  commit-then-rollback, autocommit behavior, cross-connection
  durability, UPDATE/DELETE rollback, implicit per-statement txn.
* conftest.py auto-creates testdb (logged) for the suite.
* Two old tests rewritten to assert new no-op behavior on unlogged
  DBs (test_commit_rollback_in_unlogged_db_is_noop,
  test_commit_in_unlogged_db_is_noop).

Total: 53 unit + 98 integration = 151 tests.

The Phase 3 "gate test" (test_rollback_hides_insert) — a rolled-back
INSERT must be invisible to subsequent SELECTs in the same session —
now passes against a real logged database for the first time.
2026-05-04 12:54:02 -06:00
a1bd52788d Phase 2: SELECT works end-to-end — pure-Python Informix fully reads data
cursor.execute("SELECT 1 FROM systables WHERE tabid = 1")
  cursor.fetchone() == (1,)

To my knowledge, this is the first time a pure-Python implementation
has read data from Informix without wrapping IBM's CSDK or JDBC.

Three breakthroughs in this commit:

1. Login PDU's database field is BROKEN. Passing a database name there
   makes the server reject subsequent SQ_DBOPEN with sqlcode -759
   ("database not available"). JDBC always sends NULL in the login
   PDU's database slot — we now do the same. The user-supplied database
   opens via SQ_DBOPEN in _init_session.

2. Post-login session init dance: SQ_PROTOCOLS (8-byte feature mask
   replayed verbatim from JDBC) → SQ_INFO with INFO_ENV + env vars
   (48-byte PDU replayed verbatim — DBTEMP=/tmp, SUBQCACHESZ=10) →
   SQ_DBOPEN. Without all three steps in this exact order, the server
   silently ignores SELECTs.

3. SQ_DESCRIBE per-column block has 10 fields per column (not the
   simple "name + type" my best-effort parser assumed): fieldIndex,
   columnStartPos, columnType, columnExtendedId, ownerName,
   extendedName, reference, alignment, sourceType, encodedLength.
   The string table at the end is offset-indexed (fieldIndex points
   into it), which is how JDBC handles disambiguation.

Cursor lifecycle implementation in cursors.py mirrors JDBC exactly:
  PREPARE+NDESCRIBE+WANTDONE → DESCRIBE+DONE+COST+EOT
  CURNAME+NFETCH(4096) → TUPLE*+DONE+COST+EOT
  NFETCH(4096) → DONE+COST+EOT (drain)
  CLOSE → EOT
  RELEASE → EOT

Five round trips per SELECT — same as JDBC.

Module changes:
  src/informix_db/connections.py — added _init_session(), _send_protocols(),
    _send_dbopen(), _drain_to_eot(), _raise_sq_err(); login PDU now
    forces database=None always; SQ_INFO PDU replayed verbatim from
    JDBC capture (offsets-indexed env-var format too gnarly to derive
    in MVP).
  src/informix_db/cursors.py — full rewrite: real PDU builders for
    PREPARE/CURNAME+NFETCH/NFETCH/CLOSE/RELEASE; tag-dispatched
    response readers; cursor-name generator matching JDBC's "_ifxc"
    convention.
  src/informix_db/_resultset.py — proper SQ_DESCRIBE parser per
    JDBC's receiveDescribe (USVER mode); offset-indexed string table
    with name lookup by fieldIndex; ColumnInfo dataclass with raw
    type-code preserved for null-flag extraction.
  src/informix_db/_messages.py — added SQ_NDESCRIBE=22, SQ_WANTDONE=49.

Test coverage: 40 unit + 15 integration tests (7 smoke + 8 new SELECT)
= 55 total, all green, ruff clean. New tests cover:
  - SELECT 1 returns (1,)
  - cursor.description shape per PEP 249
  - Multi-row INT SELECT
  - Multi-column mixed types (INT + FLOAT)
  - Iterator protocol (for row in cursor)
  - fetchmany(n)
  - Re-executing on same cursor resets state
  - Two cursors on one connection (sequential)

Known gap: VARCHAR row decoding doesn't yet handle the variable-width
on-wire encoding correctly. Phase 2.x will address — for now NotImpl
errors surface raw bytes in the row tuple.
2026-05-03 15:37:10 -06:00
e2c48f855e Phase 2 progress: cursor scaffolding + protocol findings (SELECT path WIP)
Cursor class scaffolded with full PEP 249 surface:
  src/informix_db/cursors.py — Cursor with execute, fetchone, fetchmany,
    fetchall, description, rowcount, arraysize, close, iterator,
    context manager. Sends SQ_COMMAND chains for parameterless SQL
    (Phase 4 adds SQ_BIND/SQ_EXECUTE for params).
  src/informix_db/_resultset.py — ColumnInfo, parse_describe,
    parse_tuple_payload. Best-effort SQ_DESCRIBE parser; refines in
    Phase 2.1.
  src/informix_db/connections.py — Connection.cursor() now returns a
    real Cursor; new _send_pdu() lets Cursor share the connection's
    socket without violating encapsulation.

Protocol findings landed in PROTOCOL_NOTES.md §6:
  §6a — SQ_PREPARE format with named tags (the "trailing 22, 49"
    are SQ_NDESCRIBE and SQ_WANTDONE chained into the same PDU).
    Confirmed against IfxSqli.sendPrepare line 1062.
  §6c — Server requires post-login init sequence (SQ_PROTOCOLS →
    SQ_INFO → SQ_ID(env vars) → SQ_DBOPEN) BEFORE any PREPARE works.
    Discovered the hard way: PREPARE without this sequence gets no
    response; SQ_DBOPEN without SQ_PROTOCOLS gets sqlcode=-759
    ("Database not available"). The login PDU's database field is
    a hint, not an open.
  §6e — SQ_TUPLE corrected: [short warn][int size][bytes payload]
    (not [int 0][short payloadLen] as earlier draft claimed).
  Two more constants added to _messages.MessageType:
    SQ_NDESCRIBE = 22, SQ_WANTDONE = 49

Tests: 40 unit + 7 integration (added 2 new — cursor() returns a
Cursor, parameter binding raises NotSupportedError). All green, ruff
clean. Removed obsolete "cursor() raises NotImplementedError" test.

What works end-to-end now: connect, cursor(), close, parameter-attempt
gating. What doesn't yet: cursor.execute("SELECT 1") — server requires
the post-login init sequence we don't yet send.

Discovered captures (kept for next session's analysis):
  docs/CAPTURES/06-py-select1-attempt.socat.log
  docs/CAPTURES/07-py-replay-jdbc-prepare.socat.log
  docs/CAPTURES/08-py-with-dbopen.socat.log
  docs/CAPTURES/09-py-full-replay.socat.log

Three new tasks created tracking the remaining Phase 2 blockers:
post-login init sequence, proper SQ_DESCRIBE parser, SQ_ID action
vocabulary helpers.
2026-05-02 21:04:30 -06:00
9b1fd8af2c Phase 1: pure-Python SQLI login works end-to-end
This commit takes informix-db from documentation-only (Phase 0 spike)
to a functional connect() / close() against a real Informix server.
To our knowledge, this is the first pure-socket Informix client in any
language — no CSDK, no JVM, no native libraries.

Layered architecture per the plan, mirroring PyMySQL's shape:

  src/informix_db/
    __init__.py        — PEP 249 surface (connect, exceptions, paramstyle="numeric")
    exceptions.py      — full PEP 249 hierarchy declared up front
    _socket.py         — raw socket I/O (read_exact, write_all, timeouts)
    _protocol.py       — IfxStreamReader / IfxStreamWriter framing primitives
                         (big-endian, 16-bit-aligned variable payloads,
                         length-prefixed nul-terminated strings)
    _messages.py       — SQ_* tags from IfxMessageTypes + ASF/login markers
    _auth.py           — pluggable auth handlers; plain-password is the
                         only Phase-1 implementation
    connections.py     — Connection class: builds the binary login PDU
                         (SLheader + PFheader byte-for-byte per
                         PROTOCOL_NOTES.md §3), sends it, parses the
                         server response, wires up close()

Phase 1 design decisions locked in DECISION_LOG.md:
  - paramstyle = "numeric" (matches Informix ESQL/C convention)
  - Python >= 3.10
  - autocommit defaults to off (PEP 249 implicit)
  - License: MIT
  - Distribution name: informix-db (verified PyPI-available)

Test coverage: 34 unit tests (codec round-trips against synthetic byte
streams; observed login-PDU values from the spike captures asserted as
exact byte literals) + 6 integration tests (connect, idempotent close,
context manager, bad-password → OperationalError, bad-host →
OperationalError, cursor() raises NotImplementedError).

  pytest                 — runs 34 unit tests, no Docker needed
  pytest -m integration  — runs 6 integration tests against the
                           Developer Edition container (pinned by digest
                           in tests/docker-compose.yml)
  pytest -m ""           — runs everything

ruff is clean across src/ and tests/.

One bug found during smoke testing: threading.get_ident() can exceed
signed 32-bit on some processes, overflowing struct.pack("!i"). Fixed
the same way the JDBC reference does — clamp to signed 32-bit, fall
back to 0 if out of range. The field is diagnostic only.

One protocol-level observation that AMENDED the JDBC source reading:
the "capability section" in the login PDU is three independently
negotiated 4-byte ints (Cap_1=1, Cap_2=0x3c000000, Cap_3=0), not one
int + 8 reserved zero bytes as my CFR decompile read suggested. The
server echoes them back identically. Trust the wire over the
decompiler.

Phase 1 verification matrix (from PROTOCOL_NOTES.md §12):
  - Login byte layout: confirmed (server accepts our pure-Python PDU)
  - Disconnection: confirmed (SQ_EXIT round-trip works)
  - Framing primitives: confirmed (34 unit tests)
  - Error path: bad password → OperationalError, bad host → OperationalError

Phase 2 (Cursor / SELECT / basic types) is the next phase. The hard
unknowns there — exact column-descriptor layout, statement-time error
format — were called out as bounded gaps in Phase 0 and have existing
captures (02-select-1.socat.log, 02-dml-cycle.socat.log) to characterize
against.
2026-05-02 19:10:24 -06:00