18 Commits

Author SHA1 Message Date
dc91084d71 Phase 11: smart-LOB BLOB/CLOB write via SQ_FILE / filetoblob
Mirrors Phase 10's read implementation in the opposite direction —
extends the SQ_FILE (98) handler with optype 2 (read-from-client)
support. Users register bytes in cursor.virtual_files; the server's
filetoblob('path', 'client') call streams them up via SQ_FILE_READ
(106) chunks. Same architectural pivot as Phase 10 — avoids the
heavy SQ_FPROUTINE+SQ_LODATA stack.

Wire protocol (per IfxSqli.receiveSQFILE case 2 line 5103+):
* Server sends [short SQ_FILE=98][short optype=2][short bufSize]
  [int readAmount][short SQ_EOT]
* Client responds [short 106][int totalAmount] then chunks
  [short 106][short chunkSize][padded data]... terminated by SQ_EOT

API:
* Low-level: cur.virtual_files['/sentinel'] = data, then SQL with
  filetoblob('/sentinel', 'client')
* High-level: cur.write_blob_column(sql, blob_data, params, clob=False)
  — substitutes BLOB_PLACEHOLDER token in the SQL with filetoblob()
  (or filetoclob for CLOB columns) and registers the bytes
  automatically. Cleans up virtual_files after the call.

The BLOB_PLACEHOLDER design was chosen over magic ?-binding because:
* bytes already maps to BYTE type (legacy in-row blobs) for ?-params
* Method on BlobLocator doesn't work for inserts (no locator yet)
* PLACEHOLDER is unmistakable at the call site

Closes the smart-LOB loop in pure Python — Phase 9's tests and
Phase 10's read fixtures previously used JDBC to seed test data.
Phase 11 eliminated that dependency: tests/test_smart_lob.py and
tests/test_smart_lob_read.py now self-seed via write_blob_column.

Bonus: integration test runtime 5.78s → 2.78s (no more per-fixture
JVM spawns). Project goal "pure Python, no native deps" now true
for the test suite too.

Tests: 9 integration tests in test_smart_lob_write.py covering
* BLOB short, multichunk (51KB), empty, binary-safe (256 values)
* BLOB UPDATE
* BLOB multi-row INSERTs
* CLOB via filetoclob
* validation (rejects SQL without BLOB_PLACEHOLDER)
* virtual_files cleanup

Total: 64 unit + 126 integration = 190 tests.
2026-05-04 14:14:37 -06:00
a9a3cfc38e Phase 10: smart-LOB BLOB read via SQ_FILE / lotofile
Implements end-to-end BLOB reading by leveraging the server's
lotofile() function and intercepting the SQ_FILE protocol with
in-memory file emulation. Avoids implementing the heavier
SQ_FPROUTINE + SQ_LODATA stack initially planned for Phase 10.

Strategy: SELECT lotofile(blob_col, '/path', 'client') causes the
server to orchestrate a SQ_FILE (98) protocol round-trip — it tells
the client to "open file X, write these bytes, close". Our handler
buffers the writes in memory keyed by filename instead of touching
disk. The bytes appear in cursor.blob_files dict.

Wire protocol (per IfxSqli.receiveSQFILE line 4980):
* SQ_FILE optype 0 (open): server sends filename + mode/flags/offset
* SQ_FILE optype 3 (write): chunked SQ_FILE_WRITE (107) blocks of
  data, terminated by SQ_EOT. Client responds with total size.
* SQ_FILE optype 1 (close): bare SQ_EOT both ways.

API:
* Low-level: cur.execute("SELECT lotofile(col, '/tmp/X', 'client') ...")
  followed by cur.blob_files[returned_filename] for the bytes.
* High-level: cur.read_blob_column("SELECT col FROM ... WHERE ...", params)
  returns bytes directly, wrapping the user's SQL with lotofile.

Bonus: row decoder now handles UDTVAR (type 40) with extended_name=
"lvarchar" — the wire format that lotofile() returns its result as.
Format: [byte indicator][int length][bytes].

Tests: 6 integration tests in test_smart_lob_read.py covering
low-level + high-level paths, NULL/no-match, multi-chunk (30KB),
and validation. Test data seeded via JDBC reference client since
smart-LOB writes still need Phase 11.

Total: 64 unit + 117 integration = 181 tests.

Strategic insight from this phase: don't estimate protocol-
implementation cost from JDBC's class hierarchy alone. JDBC's
IfxSmBlob is 600+ lines but the wire-level READ path reduces to one
SQL function call + one new tag handler. The wire is often simpler
than the SDK suggests.

Deferred to Phase 11+:
* Smart-LOB write (still needs SQ_FPROUTINE + SQ_LODATA)
* BlobLocator.read() OO API (requires locator-to-source mapping)
* SQ_FILE optype 2 (filetoblob client→server path)
2026-05-04 13:46:18 -06:00
389c32434c Phase 9: smart-LOB BLOB/CLOB locator decoding (Phase 10 deferred for fetch)
SELECT on BLOB or CLOB columns no longer requires raw byte interpretation.
The 72-byte server-side locator is wrapped in a typed BlobLocator or
ClobLocator (frozen dataclass) so the column is recognizable as
"server-side reference, not actual bytes".

Wire-protocol findings:
* Smart-LOB columns DON'T appear with their nominal type codes (102/101)
  in SQ_DESCRIBE. They surface as UDTFIXED (41) with extended_id 10
  (BLOB) or 11 (CLOB) and encoded_length=72 (locator size).
* Retrieving the actual bytes requires SQ_FPROUTINE (103) RPC to
  invoke ifx_lo_open, plus SQ_LODATA (97) for chunked transfer, plus
  another SQ_FPROUTINE for ifx_lo_close. That's a Phase 10 lift —
  roughly 2x the protocol surface of Phase 8.

Server config needed (added to Phase 7 setup):
* sbspace: onspaces -c -S sbspace1 ...
* default sbspace: onmode -wm SBSPACENAME=sbspace1

What ships in Phase 9:
* informix_db.BlobLocator(raw: bytes) — 72-byte frozen wrapper
* informix_db.ClobLocator(raw: bytes) — distinct type, same shape
* Row decoder branch in _resultset.parse_tuple_payload
* Wire constants SQ_LODATA=97, SQ_FPROUTINE=103, SQ_FPARAM=104

Tests:
* 11 unit tests in test_blob_locator_unit.py (no Informix needed) —
  construction, immutability, equality, hash, repr safety, size
  validation.
* 4 integration tests in test_smart_lob.py — fixture seeds via JDBC
  reference client (smart-LOB writes also need deferred protocols).
* RefBlob.java helper in tests/reference/ for seeding via JDBC.

Total: 64 unit + 111 integration = 175 tests.

Locator design note: __repr__ omits the raw bytes (they're opaque to
the client). Same-bytes locators of different families compare
unequal — BlobLocator(x) != ClobLocator(x) — to avoid silent type
confusion.
2026-05-04 13:26:15 -06:00
52259f0152 Phase 8: BYTE/TEXT bind+read via SQ_BBIND/SQ_BLOB/SQ_FETCHBLOB
Implements end-to-end round-trip for BYTE (type 11) and TEXT (type 12)
columns. Python bytes/bytearray map to BYTE; str is auto-encoded as
ISO-8859-1 for TEXT.

Wire protocol — write side:
* SQ_BIND payload carries a 56-byte blob descriptor with size at offset
  [16..19] (per IfxBlob.toIfx). NULL is byte 39=1.
* After all per-param blocks, SQ_BBIND (41) declares blob count, then
  chunked SQ_BLOB (39) messages stream the actual bytes (max 1024
  bytes/chunk per JDBC), terminated by zero-length SQ_BLOB.
* Then SQ_EXECUTE proceeds normally.

Wire protocol — read side:
* SQ_TUPLE returns only the 56-byte descriptor; actual bytes live in
  the blobspace.
* For each BYTE/TEXT column in each row, send SQ_FETCHBLOB with the
  descriptor and read SQ_BLOB chunks until zero-length terminator.
* The locator is only valid while the cursor is open — must dereference
  BEFORE sending CLOSE. Doing it after returns -602 (Cannot open blob).

Server-side prerequisites (one-time setup):
1. blobspace: onspaces -c -b blobspace1 -p /path -o 0 -s 50000
2. logged DB: CREATE DATABASE testdb WITH LOG
3. config + archive:
     onmode -wm LTAPEDEV=/dev/null
     onmode -wm TAPEDEV=/dev/null
     onmode -l
     ontape -s -L 0 -t /dev/null

Without #3, JDBC fails identically to our driver with "BLOB pages can't
be allocated from a chunk until chunk add is logged". This identical
failure was the diagnostic confirmation that our protocol bytes were
correct — same server response = byte-for-byte parity.

Tests: 9 integration tests in tests/test_blob.py — single-chunk,
multi-chunk (5120 bytes), NULL, multi-row, binary-safe, TEXT roundtrip,
ISO-8859-1, NULL TEXT, mixed columns. Plus the Phase 4
test_unsupported_param_type_raises was updated since bytes is no longer
the canonical unsupported type — switched to a custom class.

Total: 53 unit + 107 integration = 160 tests.

The smart-LOB family (BLOB/CLOB) is a separate state-machine extension
deferred to Phase 9 — it uses IfxLocator + LO_OPEN/LO_READ session
protocol against sbspace, not the BBIND/BLOB stream.
2026-05-04 13:13:55 -06:00
1c19c71cb6 Phase 7: real transaction semantics on logged databases
Introduces driver-managed transactions that work seamlessly across
logged and unlogged databases. The user calls commit() and rollback()
without needing to know which kind they're hitting — the connection
tracks transaction state internally.

Three protocol facts came out of integration testing:

1. Logged DBs in non-ANSI mode require an explicit SQ_BEGIN before
   the first DML — the server doesn't auto-open a transaction.
   Connection._ensure_transaction() sends SQ_BEGIN lazily and is
   idempotent within an open txn. After commit/rollback, the next
   DML triggers a fresh BEGIN.

2. SQ_RBWORK has a [short savepoint=0] payload before the SQ_EOT
   framing tag — sending SQ_RBWORK alone causes the server to hang
   silently (waiting for the missing 2 bytes). SQ_CMMTWORK has no
   payload. This is the same pattern as the SHORT-vs-INT bug from
   Phase 4.x and the 2-byte length prefix from Phase 6.c — when the
   server hangs, it's an incomplete PDU body.

3. SQ_XACTSTAT (tag 99) is a logged-DB-only message that's
   interleaved with normal responses. Now drained in all four
   response-reading paths: cursor _drain_to_eot, _read_describe_
   response, _read_fetch_response, and connection _drain_to_eot.

For unlogged DBs (e.g., sysmaster), SQ_BEGIN returns -201 and we
cache that result so subsequent DML doesn't re-probe. commit() and
rollback() are silent no-ops in that case — same client code works
across both DB modes.

Tests:
* New tests/test_transactions.py — 10 integration tests covering
  commit visibility, rollback isolation, multi-row rollback, partial
  commit-then-rollback, autocommit behavior, cross-connection
  durability, UPDATE/DELETE rollback, implicit per-statement txn.
* conftest.py auto-creates testdb (logged) for the suite.
* Two old tests rewritten to assert new no-op behavior on unlogged
  DBs (test_commit_rollback_in_unlogged_db_is_noop,
  test_commit_in_unlogged_db_is_noop).

Total: 53 unit + 98 integration = 151 tests.

The Phase 3 "gate test" (test_rollback_hides_insert) — a rolled-back
INSERT must be invisible to subsequent SELECTs in the same session —
now passes against a real logged database for the first time.
2026-05-04 12:54:02 -06:00
f546f951c8 Phase 6.f: BYTE/TEXT/BLOB/CLOB protocol research (deferred to Phase 8+)
Empirical and source-level investigation of the LOB type families.
Findings:

* BYTE/TEXT (type 11/12) cannot be inserted via SQL literals — even
  dbaccess with `INSERT INTO t VALUES (1, "0x...")` returns -617
  "A blob data type must be supplied within this context". The server
  requires a binary BBIND wire path. Hard restriction.

* BYTE/TEXT wire protocol: SQ_BIND sends a 56-byte descriptor as the
  inline placeholder, then a separate SQ_BBIND (41) PDU declares blob
  count, then chunked SQ_BLOB (39) tags stream the actual bytes (max
  1024 bytes/chunk per JDBC's sendStreamBlob).

* BLOB/CLOB (type 101/102) are even more involved — smart-LOBs use an
  LO_OPEN/LO_READ/LO_WRITE/LO_CLOSE session protocol against sbspace,
  with locators carried inline in SQ_TUPLE.

* Server-side setup confirmed working: blobspace1 + sbspace1 + logged
  database (testdb) are now available in the dev container for future
  Phase 8/9 implementation.

Both LOB families require materially more state-machine work than the
single-PDU codec types (DECIMAL/DATETIME/INTERVAL). Splitting into
Phase 8 (BYTE/TEXT) and Phase 9 (BLOB/CLOB) lets each get focused
attention rather than half-implementing both.

The SQ_BBIND, SQ_BLOB, SQ_FETCHBLOB, SQ_SBBIND, SQ_FILE_READ,
SQ_FILE_WRITE constants are already declared in _messages.py from
Phase 1 scaffolding — protocol layer is ready when implementation
lands.

For users who need binary data <32K today: LVARCHAR via str encoded
with iso-8859-1 is a viable interim path.
2026-05-04 12:37:46 -06:00
888b8079d3 Phase 6.e: INTERVAL parameter encoding
Implements encoders for datetime.timedelta → INTERVAL DAY(9) TO FRACTION(5)
and IntervalYM → INTERVAL YEAR(9) TO MONTH. Both follow the 2-byte-length-
prefixed BCD wire format established in Phase 6.c (DECIMAL/DATETIME).

The default qualifier choice is generous: DAY(9) covers any timedelta,
YEAR(9) handles ±1B years. JDBC defaults to smaller widths (DAY(2)/YEAR(4))
trading safety for compactness — we make the opposite trade.

FRACTION(5) is the Informix precision ceiling — sub-10us intervals can't
round-trip cleanly. Same limitation JDBC has.

Six integration tests, all green on first run against live Informix —
the synthetic round-trip in the test framework caught every framing bug
locally, before integration tests even started. This is the dividend from
owning both decoder and encoder.

Total: 53 unit + 88 integration = 141 tests.

Type matrix update: INTERVAL now has both decode + encode. Only BLOB/CLOB
and BYTE/TEXT remain among the common types.
2026-05-04 12:30:48 -06:00
4dafbf8ce9 Phase 6.d: INTERVAL decoding (both qualifier families)
Implements row-decoding for IDS INTERVAL, the last common temporal type.
The qualifier short bisects the type at the wire level: start_TU >= DAY
maps to datetime.timedelta (day-fraction), start_TU <= MONTH maps to a
new informix_db.IntervalYM (year-month).

Wire format mirrors DATETIME exactly — `[head byte][digit pairs in
base-100]`, with the qualifier dictating field interpretation. The
fraction-to-nanoseconds scaling (`scale_exp = 18 - end_TU`, forced odd)
is the JDBC pattern from `Decimal.fromIfxToArray`.

IntervalYM is a frozen dataclass holding signed total months, with
`years` and `remainder_months` as derived properties. Matches JDBC's
`IntervalYM.months` shape rather than a (years, months) tuple — avoids
ambiguity around what "negative" means for a multi-field tuple.

Tests: 13 unit (synthetic byte streams covering all decoder branches)
+ 9 integration (real Informix queries spanning DAY TO SECOND, HOUR TO
SECOND, YEAR TO MONTH, negatives, NULL, and mixed-family rows).

Total test count: 53 unit + 82 integration = 135.

Encoder for INTERVAL parameter binding is deferred to a later phase
(same arc as DECIMAL/DATETIME — decode lands first).
2026-05-04 12:22:07 -06:00
10863a9337 Phase 6.c: DATE / DATETIME / DECIMAL parameter encoding
Now you can pass Python datetime/date/Decimal values directly:

  cur.execute('INSERT INTO t VALUES (?, ?, ?)',
              (1, datetime.datetime(2026, 5, 4, 12, 34, 56), Decimal('1234.56')))
  cur.execute('SELECT id FROM t WHERE d > ?', (datetime.date(2025, 1, 1),))

The 2-byte length-prefix discovery: both my Phase 6.a DECIMAL encoder
and the new Phase 6.c DATETIME encoder produced "correct" BCD bytes
but the server silently dropped the SQ_BIND PDU (no response, just
timeout). Captured the wire, diffed against JDBC, and found that
DECIMAL/DATETIME bind data has a 2-byte length PREFIX wrapping the
BCD payload (per Decimal.javaToIfx line 457). With the prefix added,
both encoders work. DATE doesn't need the prefix — it's a fixed
4-byte int.

Per-type wire format:
  date     → DATE(7),     [4-byte BE int = days since 1899-12-31]
  datetime → DATETIME(10), [short total_len][byte 0xc7][7 BCD pairs]
  Decimal  → DECIMAL(5),  [short total_len][byte exp][BCD digit pairs]

For DATETIME the encoder always emits YEAR TO SECOND form (no
microseconds) — covers the common case. Phase 6.x can add YEAR TO
FRACTION(N) variants if microsecond precision is needed.

For DECIMAL the encoder uses the asymmetric base-100 complement
(mirror of decoder) for negatives. Tested with positive, negative,
and fractional values.

Lesson for the protocol playbook: when the server silently drops a
PDU, it's almost always an envelope/framing issue rather than the
inner-value bytes being wrong. Same pattern as the SHORT-vs-INT
reserved field in CURNAME+NFETCH and the even-byte alignment pad.

Module changes:
  src/informix_db/converters.py:
    + _encode_date — 4-byte BE int day count
    + _encode_datetime — YEAR TO SECOND form with 2-byte length prefix
    + _encode_decimal — re-enabled (was Phase 6.x stub) with the same
      length-prefix fix
    + encode_param() dispatches on datetime.datetime BEFORE
      datetime.date (since datetime is a subclass of date in Python)

Tests: 40 unit + 73 integration (3 new date/datetime param tests + 1
updated decimal param test) = 113 total, all green, ruff clean. New
tests cover:
  - date as INSERT parameter via executemany — 3 dates round-trip
  - datetime as INSERT parameter via executemany — 3 timestamps
  - date as parameter in a WHERE clause filter (created_at > ?)
  - Decimal round trip (was: NotImplementedError check; now: real
    INSERT + SELECT verification)

Type support matrix updates:
  DATE       — encode ✓ + decode ✓ (was decode-only)
  DATETIME   — encode ✓ + decode ✓ (was decode-only)
  DECIMAL    — encode ✓ + decode ✓ (was decode-only)
2026-05-04 12:09:16 -06:00
6819dd4cb0 Phase 6.b: DATETIME decoding for all qualifier ranges
Before:
  cur.execute("SELECT CURRENT YEAR TO SECOND ...")
  cur.fetchone()  # → (b'\xc7\x14\x1a\x05\x04...',) raw BCD bytes

After:
  cur.execute("SELECT CURRENT YEAR TO SECOND ...")
  cur.fetchone()  # → (datetime.datetime(2026, 5, 4, 12, 34, 56),)

Decoder picks the right Python type by qualifier:
  YEAR/MONTH/DAY-only → datetime.date
  HOUR/MIN/SEC-only   → datetime.time
  spans across both   → datetime.datetime

Wire format (per IfxToJavaDateTime + Decimal.init treating as packed BCD):
  byte[0] = sign + biased exponent (in base-100 digit pairs)
  byte[1..] = BCD digit pairs: YYYY (2 bytes) + MM + DD + HH + MI + SS + FFFFF

Qualifier extraction from column descriptor:
  encoded_length = (digit_count << 8) | (start_TU << 4) | end_TU
  TU codes: YEAR=0, MONTH=2, DAY=4, HOUR=6, MIN=8, SEC=10,
            FRAC1=11..FRAC5=15

Verified against four DATETIME columns of different qualifiers in
one tuple — see test_datetime_multiple_columns_in_one_row:
  YEAR TO SECOND       → datetime.datetime(2026, 5, 4, 12, 34, 56)
  YEAR TO DAY          → datetime.date(2026, 5, 4)
  HOUR TO SECOND       → datetime.time(12, 34, 56)
  YEAR TO FRACTION(3)  → datetime.datetime(...)

Module changes:
  src/informix_db/converters.py:
    + _decode_datetime(raw, encoded_length) — qualifier-driven BCD walk
    + TU constants (_TU_YEAR, _TU_MONTH, ..., _TU_SECOND)
  src/informix_db/_resultset.py:
    + DATETIME row-decoder branch — computes width from digit_count
      in encoded_length high byte, calls _decode_datetime with the
      packed qualifier so it can pick the right Python type

Tests: 40 unit + 70 integration (7 new DATETIME tests) = 110 total,
all green, ruff clean. Tests cover:
  - YEAR TO SECOND → datetime.datetime
  - YEAR TO DAY → datetime.date
  - HOUR TO SECOND → datetime.time
  - CURRENT YEAR TO FRACTION(3) → datetime.datetime
  - Mixed qualifiers in one row
  - DATETIME stored in a real table column (round-trip via SELECT)
  - NULL DATETIME → Python None

DATETIME parameter binding (encoder) is Phase 6.x — same status as
DECIMAL encoder.
2026-05-04 12:02:40 -06:00
2bacbc4e53 Phase 6.a: DECIMAL/MONEY row decoding works (COUNT/SUM/AVG return Decimal)
Before:
  cur.execute('SELECT COUNT(*) FROM systables')
  cur.fetchone()  # → (b'\xc2\x02\x00\x00\x00\x00\x00\x00\x00',) raw bytes

After:
  cur.execute('SELECT COUNT(*) FROM systables')
  cur.fetchone()  # → (Decimal('276'),)

The trickiest decode of the project so far. IDS DECIMAL/MONEY wire format:

  byte[0] = (sign << 7) | biased_exponent_base100
    bit 7 = sign (1=positive, 0=negative)
    bits 0-6 = (exponent + 64), XOR'd with 0x7F if negative
  byte[1..] = digit-pair bytes (each 0..99 = two BCD digits)
    if negative: asymmetric base-100 complement applied:
      walk digits right→left, trailing zeros stay zero,
      first non-zero subtracts from 100, rest from 99

Initial naive "99 - d for all digits" decoder gave artifacts like
-1234.559999 instead of -1234.56. The asymmetric complement rule
(from Decimal.decComplement line 447) is what makes negatives
round-trip exactly.

Width on the wire: per-column encoded_length packed as
(precision << 8) | scale; byte width = ceil(precision/2) + 1.
parse_tuple_payload uses this to slice DECIMAL columns correctly.

Tested cases all decode correctly:
  COUNT(*)             → Decimal('276')
  SUM(tabid)           → Decimal('55')
  AVG(tabid)           → Decimal('5.5')
  1234.56::DECIMAL     → Decimal('1234.56')
  -1234.56::DECIMAL    → Decimal('-1234.56')
  -0.5::DECIMAL        → Decimal('-0.5')
  -99.99::DECIMAL      → Decimal('-99.99')
  -12345678.9::DECIMAL → Decimal('-12345678.9')
  NULL                 → None

Encoder (_encode_decimal) is implemented but disabled — server rejects
the produced bytes (precision packing not quite right). Phase 6.x will
fix. Workaround: cast Decimal to float, or pass via SQL literal.

Module changes:
  src/informix_db/converters.py:
    + decimal module import
    + _decode_decimal — full BCD decoder with asymmetric complement
    + _encode_decimal (Phase 6.x stub — present but unreached)
    + DECIMAL/MONEY added to DECODERS dispatch
  src/informix_db/_resultset.py:
    + DECIMAL/MONEY width computation from encoded_length

Tests: 40 unit + 55 integration (8 new DECIMAL) = 95 total, all
green, ruff clean.
2026-05-04 11:17:59 -06:00
d508a489fd Phase 4.x: parameterized SELECT, NULL row decoding, executemany()
Three Phase 4 follow-ups in one push, all with empirical wire analysis:

1. PARAMETERIZED SELECT
   cur.execute('SELECT tabname FROM systables WHERE tabid = ?', (1,))
   → ('systables',)
   Wire flow: PREPARE → DESCRIBE → SQ_BIND-only (no EXECUTE) →
   CURNAME+NFETCH → TUPLE+DONE → drain → CLOSE+RELEASE.
   The cursor open is what executes the prepared query; SQ_BIND just
   binds values into scope. No need for the IDESCRIBE handshake JDBC
   does for type discovery — server accepts our typed bind directly.

2. NULL ROW DECODING — per-type sentinel detection
   Each IDS type has its own NULL sentinel in tuple data:
     INT     → 0x80000000 (INT_MIN)
     BIGINT  → 0x8000000000000000 (LONG_MIN)
     SMALLINT→ 0x8000 (SHORT_MIN)
     REAL    → all 0xFF (NaN bit pattern)
     FLOAT   → all 0xFF
     DATE    → 0x80000000 (same as INT)
     VARCHAR → [byte 1][byte 0]  (length=1, single nul) — distinguishable
                from empty '' which is [byte 0] (length=0)
   Verified by wire capture against the dev container — see
   docs/CAPTURES/19-py-null-vs-onechar.socat.log and
   docs/CAPTURES/20-py-int-null.socat.log.

   The VARCHAR null marker is the trickiest because it LOOKS like a
   1-byte string of nul, but VARCHAR can't contain embedded nuls
   anyway, so the byte-0 within length-1 is unambiguous.

3. executemany(sql, seq_of_params) — PEP 249 batched DML
   PREPARE once, loop SQ_BIND+SQ_EXECUTE per param set, RELEASE once.
   Performance: only ~1.06x faster than execute() loop for 200 INSERTs
   (dominated by per-row round trips). Phase 4.x optimization opportunity:
   chain BIND+EXECUTE in one PDU without intermediate flush+read for
   true bulk performance (would likely give 5-10x). Documented in
   DECISION_LOG.md as a follow-up.

Module changes:
  src/informix_db/converters.py:
    + Per-type NULL sentinel constants and detection in each decoder
    + Decoders now return None for sentinel values
  src/informix_db/cursors.py:
    + _execute_select_with_params() — SQ_BIND alone, then cursor open
    + _build_bind_only_pdu() — SQ_BIND without trailing SQ_EXECUTE
    + executemany() — loop BIND+EXECUTE, accumulate rowcount
    + execute() now dispatches to _execute_select_with_params for
      parameterized SELECT (was: NotSupportedError)

Tests: 40 unit + 47 integration (was 32; added 15 new) = 87 total,
all green, ruff clean. New test files / cases:
  tests/test_nulls.py (7) — NULL decoding for INT, BIGINT, FLOAT,
    REAL, VARCHAR, empty-vs-null, mixed columns
  tests/test_params.py — added 4 parameterized SELECT tests, 5
    executemany tests
  tests/test_smoke.py — updated cursor-with-params test (was Phase 1
    "raises", now Phase 4 "works")

Discovered captures kept for next-session debugging:
  docs/CAPTURES/18-py-null-rows.socat.log
  docs/CAPTURES/19-py-null-vs-onechar.socat.log
  docs/CAPTURES/20-py-int-null.socat.log
2026-05-04 11:11:50 -06:00
509af9efa4 Phase 4: parameter binding (SQ_BIND) — int, float, str, bool, None
cur.execute("INSERT INTO t VALUES (?, ?, ?)", (42, "hello", 3.14))
cur.execute("INSERT INTO t VALUES (:1, :2)", (99, "world"))
cur.execute("UPDATE t SET name = ? WHERE id = ?", ("new", 2))
cur.execute("DELETE FROM t WHERE id = ?", (5,))
# all work end-to-end against a real Informix server

Two breakthroughs decoded from JDBC:

1. SQ_BIND PDU shape (chained with SQ_EXECUTE in one PDU, no separate
   round trip):
     [short SQ_ID=4][int SQ_BIND=5][short numparams]
     for each param:
       [short type][short indicator][short prec_or_encLen]
       writePadded(rawbytes)
     [short SQ_EXECUTE=7][short SQ_EOT]

2. Strings are sent as CHAR (type=0) not VARCHAR (type=13). The server
   handles conversion to the actual column type via internal CIDESCRIBE
   — we don't need to do it explicitly.

Per-type encoding (Phase 4 MVP):
  int (32-bit) → IDS INT (type=2), prec=0x0a00 (packed width=10/scale=0),
                  4-byte BE
  int (64-bit) → IDS BIGINT (type=52), prec=0x1300, 8-byte BE
  str          → IDS CHAR (type=0), prec=0, [short len][bytes][pad]
  float        → IDS FLOAT (type=3), prec=0, 8-byte IEEE 754
  bool         → IDS BOOL (type=45), prec=0, 1 byte
  None         → indicator=-1, no data

The integer "precision" field is PACKED — initially looked like a bug
(why would precision be 2560?) until I realized 0x0a00 = (10 << 8) | 0
= packed display-width and scale. Captured this surprise in
DECISION_LOG.md.

Critical fix to execute-path branching: parameterized INSERT also
returns nfields > 0 (server describes the would-be inserted row).
Switched from "branch on nfields" to "branch on SQL keyword" — JDBC
does the same via its IfxStatement / IfxPreparedStatement subclassing.

Numeric paramstyle support: cur.execute("... :1 ...", (val,)) works
by rewriting :N → ? before sending PREPARE. Trivial regex (doesn't
escape strings/comments — Phase 5 can add a proper SQL tokenizer).

Module changes:
  src/informix_db/converters.py:
    + encode_param() dispatcher
    + _encode_int / _encode_bigint / _encode_str / _encode_float / _encode_bool
  src/informix_db/cursors.py:
    + _build_bind_execute_pdu() — chains SQ_BIND + SQ_EXECUTE in one PDU
    + _execute_dml_with_params() — sends bind PDU, drains, releases
    + execute() now accepts parameters; rewrites :N → ?; branches by
      SQL keyword (SELECT vs DML)
    + _NUMERIC_PLACEHOLDER_RE for paramstyle="numeric" support

Tests: 40 unit + 32 integration (8 new parameter tests + 1 updated
smoke) = 72 total, all green, ruff clean. New tests cover:
  - INSERT with ? params
  - INSERT with :N params
  - INT + FLOAT + str round trip via INSERT then SELECT
  - UPDATE with params in SET and WHERE
  - DELETE with parameter in WHERE
  - Unsupported param type (bytes) raises NotImplementedError
  - Parameterized SELECT raises NotSupportedError (Phase 4.x)
  - Dict/named params raise NotSupportedError

Known gaps (Phase 4.x / Phase 5):
  - Parameterized SELECT (needs SQ_BIND before CURNAME+NFETCH)
  - NULL row decoding for VARCHAR (currently surfaces empty string)
  - Proper SQL tokenizer (so :N inside string literals is preserved)
  - bytes/datetime/Decimal parameter types
2026-05-04 10:54:32 -06:00
92c4fdbcbf Phase 3: DDL + DML + commit/rollback wire machinery
Cursor.execute now branches on DESCRIBE response's nfields:
  - nfields > 0 → SELECT path (cursor lifecycle: CURNAME+NFETCH+...)
  - nfields == 0 → DDL/DML path (just SQ_EXECUTE then SQ_RELEASE)

Examples that work end-to-end against the dev container:

  cur.execute('CREATE TEMP TABLE t (id INTEGER, name VARCHAR(50))')
  cur.execute("INSERT INTO t VALUES (1, 'hello')")  # rowcount=1
  cur.execute("UPDATE t SET name = 'new' WHERE id = 1")
  cur.execute('DELETE FROM t WHERE id = 1')

Plus full mix: CREATE → 5 INSERTs → SELECT WHERE → DELETE WHERE → SELECT
(see tests/test_dml.py::test_full_dml_cycle_in_one_connection).

Three protocol findings during this push, documented in DECISION_LOG.md:

1. SQ_INSERTDONE (=94) is METADATA, not execution. It arrives in BOTH
   the DESCRIBE response (PREPARE phase) AND the EXECUTE response for
   literal-value INSERTs. The PREPARE-phase SQ_INSERTDONE carries the
   serial values that WILL be assigned IF you execute. The EXECUTE-
   phase SQ_INSERTDONE confirms execution. My initial assumption was
   "PREPARE-phase INSERTDONE means already-executed" — wrong. Skipping
   SQ_EXECUTE made the row not persist (SELECT returned []). Lesson:
   optimization-looking responses may not be what they look like —
   always verify with a follow-up SELECT.

2. SQ_INSERTDONE wire format: 18 bytes (10 byte longint serial8 + 8
   byte bigint bigserial). Per IfxSqli.receiveInsertDone line 2347.
   We read-and-discard for now; Phase 5+ surfaces as Cursor.lastrowid.

3. Transactions: commit() and rollback() are 2-byte messages.
   SQ_CMMTWORK=19 + SQ_EOT for commit; SQ_RBWORK=20 + SQ_EOT for
   rollback. Server responds with SQ_DONE+SQ_EOT in logged databases,
   or SQ_ERR sqlcode=-255 ("Not in transaction") in unlogged databases
   like sysmaster. Wire machinery is implemented; full transaction
   testing needs a logged DB (use ``stores_demo`` from the dev image).

Module changes:
  src/informix_db/cursors.py:
    - execute() branches on nfields (SELECT path vs DDL/DML path)
    - new _execute_dml() does just EXECUTE + RELEASE
    - new _build_execute_pdu() emits the 8-byte SQ_ID(EXECUTE)+EOT
    - _read_describe_response() and _drain_to_eot() handle SQ_INSERTDONE
  src/informix_db/connections.py:
    - commit() / rollback() now functional — send the SQ_CMMTWORK /
      SQ_RBWORK PDU and drain the response

Tests: 40 unit + 24 integration (6 new DML tests) = 64 total, all
green, ruff clean. New tests cover:
  - CREATE TEMP TABLE
  - INSERT (rowcount=1, persists, SELECT shows it)
  - UPDATE WHERE (specific row changed)
  - DELETE WHERE (specific row removed)
  - Full mixed cycle (CREATE + 5 INSERTs + SELECT + DELETE + SELECT)
  - commit() in unlogged DB raises OperationalError sqlcode=-255

Captured wire artifacts kept for future debugging:
  docs/CAPTURES/16-py-insert-literal.socat.log
  docs/CAPTURES/17-py-insert-select.socat.log
2026-05-04 08:02:48 -06:00
34ad04a872 Phase 2.x: VARCHAR row decoding works — three byte-level fixes
Three findings, each caught by a different debugging technique,
documented in DECISION_LOG.md:

1. CURNAME+NFETCH PDU: trailing reserved field is SHORT not INT.
   Caught by byte-diffing our 44-byte PDU against JDBC's 42-byte
   reference under socat. The server tolerated the longer version
   for INT-only SELECTs (silently consuming extra zeros) but
   rejected it for VARCHAR queries. Lesson: server tolerance varies
   by query type — always match JDBC byte-for-byte.

2. SQ_TUPLE payload pads to even byte alignment. An 11-byte
   "syscolumns" VARCHAR payload had a trailing 0x00 between it and
   the next SQ_TUPLE tag. JDBC's IfxRowColumn.readTuple consumes
   this pad silently; we weren't, so any odd-length variable-width
   row desynced the parser.

3. VARCHAR/NCHAR/NVCHAR in tuple data use a SINGLE-byte length
   prefix (max 255 chars — IDS VARCHAR's hard limit). NOT a 2-byte
   short as I'd initially assumed. CHAR is fixed-width per
   encoded_length. LVARCHAR uses a 4-byte int prefix for >255 byte
   values.

Module changes:
  src/informix_db/_resultset.py — _LENGTH_PREFIXED_SHORT_TYPES set,
    branched VARCHAR/NCHAR/NVCHAR (1-byte prefix) vs CHAR (fixed)
    vs LVARCHAR (4-byte prefix); even-byte alignment pad consumed
    after each SQ_TUPLE payload.
  src/informix_db/cursors.py — CURNAME+NFETCH and standalone NFETCH
    PDUs now write_short(0) for the reserved trailing field.

Tests: 40 unit + 18 integration (3 new VARCHAR tests) = 58 total,
all green, ruff clean. New tests cover:
  - VARCHAR single-column SELECT
  - Odd-length VARCHAR row (regression for the pad-byte bug)
  - Mixed INT + VARCHAR + FLOAT three-column SELECT

Sample output:
  SELECT FIRST 5 tabname FROM systables → ('systables',),
    ('syscolumns',), ('sysindices',), ('systabauth',), ('syscolauth',)
  SELECT FIRST 3 tabname, tabid, nrows → ('systables', 1, 276.0), ...

VARCHAR was the last known gap from the Phase 2 commit. Phase 2
now reads INT, BIGINT, REAL, FLOAT, CHAR, VARCHAR end-to-end. Phase
6+ types (DATETIME, INTERVAL, DECIMAL, BLOBs) remain.
2026-05-04 07:55:13 -06:00
ea00990774 Phase 1 polish: PDU match test catches a real capability-int bug
Polish item #1: byte-for-byte regression test that asserts our
generated login PDU is structurally identical to JDBC's reference
captured in docs/CAPTURES/01-connect-only.socat.log.

The test (tests/test_pdu_match.py) immediately caught a real bug:
the capability section was misread during Phase 0 byte-decoding.
Earlier text claimed Cap_1=1, Cap_2=0x3c000000, Cap_3=0 — actually:

  Cap_1 = 0x0000013c   (= (capability_class << 8) | protocol_version
                          where protocol_version = 0x3c = PF_PROT_SQLI_0600)
  Cap_2 = 0
  Cap_3 = 0

The misalignment was: the 0x3c byte I attributed to Cap_2's high
byte was actually Cap_1's low byte. The dev-image server is
permissive enough to accept arbitrary capability values, so the
connection succeeded even with the wrong bytes — but the PDU wasn't
structurally identical to JDBC's reference. SERVER-ACCEPTS ≠
STRUCTURALLY-CORRECT. This is exactly why the byte-for-byte diff
was the right polish item; "it connects" was a false ceiling.

After fix:
- 6 PDU-match tests assert byte-for-byte equality at offsets 2..280
  (the structural prefix: SLheader sans length, all login markers,
  capability ints, username, password, protocol IDs, env vars).
- Bytes 280+ legitimately differ per process (PID, TID, hostname,
  cwd, AppName) — those are NOT asserted.
- Length field (offsets 0..1) also legitimately differs because our
  PDU has shorter env list and AppName.
- Test uses monkey-patched IfxSocket so no network is needed.

Polish item #2: Makefile per global CLAUDE.md convention. Targets:
install, lint, format, test, test-integration, test-all, test-pdu,
ifx-up/down/logs/shell/status, capture (re-run JDBC scenarios under
socat), clean. `make` (no target) prints help.

Doc updates:
- PROTOCOL_NOTES.md §12: corrected capability section with the
  actual values and an explanation of the methodology lesson
- DECISION_LOG.md: new entry recording the correction with a
  pointer to the regression test and the takeaway

Side artifacts:
- docs/CAPTURES/03-py-connect-only.socat.log
- docs/CAPTURES/04-py-no-database.socat.log
- docs/CAPTURES/05-py-fixed-caps.socat.log

Test counts: 40 unit + 6 integration = 46 total, all green, ruff clean.
2026-05-02 20:18:03 -06:00
1a149074d4 Phase 0: populate PROTOCOL_NOTES and JDBC_NOTES from clean-room JDBC reading
Decompiled ifxjdbc.jar (4.50.JC10, build 146, 2023-03-07) with CFR 0.152
into build/jdbc-src/. The decompiled tree is gitignored — it's a
clean-room understanding reference, not shipped code.

Findings landed in two artifacts:

JDBC_NOTES.md — the reverse-lookup index:
- JAR identity (SHA256, manifest, line counts)
- Package layout (com.informix.{asf,jdbc,lang} are the load-bearing
  packages; org.bson and the JDBC API surface get ignored)
- Class index mapping each wire-protocol concern to the responsible
  Java class. Highlights:
  - com.informix.asf.Connection (the wire transport / login PDU)
  - com.informix.asf.IfxData{Input,Output}Stream (framing primitives)
  - com.informix.jdbc.IfxMessageTypes (140+ message-tag constants)
  - com.informix.lang.JavaToIfxType / IfxToJavaType (codecs)
  - com.informix.jdbc.IfxSqli / IfxSqliConnect (the SQLI state machine)
- Auth landscape: plain-password is inline in the binary login PDU;
  PAM is a server-initiated post-login challenge/response; CSM is
  removed from this driver (literally throws an error if you try)

PROTOCOL_NOTES.md — the byte-level wire-format reference:
- Endianness: big-endian, network byte order (confirmed from
  JavaToIfxInt source)
- Width table: SmallInt 2B, Int 4B, BigInt 8B, plus the legacy 10-byte
  LongInt that we skip for MVP
- 16-bit alignment requirement for variable-length payloads — every
  string/decimal/datetime is 0-padded if odd-length, missing this
  desynchronizes the parser
- Login PDU structure decoded byte-by-byte from encodeAscBinary():
  SLheader (6 bytes) + PFheader with markers 100/101/104/106/107/
  108/116/127, capability bitfield, env vars, process info, app name
- Disconnection: bare [short SQ_EXIT=56] both directions, no header
- Post-login messages have NO header — protocol is stream-oriented:
  [short tag][payload][short tag][payload]...
- Message-type tag table categorized by purpose
- Open questions list and cross-check matrix tracking what's
  JDBC-derived vs PCAP-confirmed

DECISION_LOG.md additions:
- ifxjdbc.jar 4.50.JC10 selected as JDBC reference; CFR 0.152 as decompiler
- CSM is officially dead — never plan for it
- Plain-password auth is single-round-trip (no challenge/response)
- Wire-framing primitives locked in for _protocol.py
- Container credentials: user=informix, password=in4mix, on port 9088,
  TLS off

Phase 0 exit gate: criteria #1 (login layout), #2 (message-type tags),
#3 (SELECT 1 hypothesis) are derived from JDBC. PCAP capture (task #7)
and cross-reference (task #2) remaining to corroborate.
2026-05-02 16:00:30 -06:00
f202dbce0c Initialize Phase 0 spike scaffold
Project goal: pure-Python implementation of the Informix SQLI wire
protocol. No CSDK, no JVM, no native deps. Targets icr.io/informix
/informix-developer-database (port 9088) as the dev/test instance.

Phase 0 is a documentation-only spike that gates all implementation
work. The four scaffolds:

- README.md: project status and Phase 0 deliverable index
- docs/PROTOCOL_NOTES.md: byte-level wire-format reference (TBD)
- docs/JDBC_NOTES.md: reverse-lookup index into the decompiled IBM
  JDBC driver (4.50.4.1), populated from build/jdbc-src/ once the
  decompile lands
- docs/DECISION_LOG.md: running rationale, with the Phase-1 paramstyle
  /Python-floor/autocommit decisions pre-locked so they don't churn
  later

CLAUDE.md is gitignored — operator-private context, public-PyPI repo.
2026-05-02 13:22:28 -06:00