informix-db/tests/test_pdu_match.py
Ryan Malloy a1bd52788d Phase 2: SELECT works end-to-end — pure-Python Informix fully reads data
cursor.execute("SELECT 1 FROM systables WHERE tabid = 1")
  cursor.fetchone() == (1,)

To my knowledge, this is the first time a pure-Python implementation
has read data from Informix without wrapping IBM's CSDK or JDBC.

Three breakthroughs in this commit:

1. Login PDU's database field is BROKEN. Passing a database name there
   makes the server reject subsequent SQ_DBOPEN with sqlcode -759
   ("database not available"). JDBC always sends NULL in the login
   PDU's database slot — we now do the same. The user-supplied database
   opens via SQ_DBOPEN in _init_session.

2. Post-login session init dance: SQ_PROTOCOLS (8-byte feature mask
   replayed verbatim from JDBC) → SQ_INFO with INFO_ENV + env vars
   (48-byte PDU replayed verbatim — DBTEMP=/tmp, SUBQCACHESZ=10) →
   SQ_DBOPEN. Without all three steps in this exact order, the server
   silently ignores SELECTs.

3. SQ_DESCRIBE per-column block has 10 fields per column (not the
   simple "name + type" my best-effort parser assumed): fieldIndex,
   columnStartPos, columnType, columnExtendedId, ownerName,
   extendedName, reference, alignment, sourceType, encodedLength.
   The string table at the end is offset-indexed (fieldIndex points
   into it), which is how JDBC handles disambiguation.

Cursor lifecycle implementation in cursors.py mirrors JDBC exactly:
  PREPARE+NDESCRIBE+WANTDONE → DESCRIBE+DONE+COST+EOT
  CURNAME+NFETCH(4096) → TUPLE*+DONE+COST+EOT
  NFETCH(4096) → DONE+COST+EOT (drain)
  CLOSE → EOT
  RELEASE → EOT

Five round trips per SELECT — same as JDBC.

Module changes:
  src/informix_db/connections.py — added _init_session(), _send_protocols(),
    _send_dbopen(), _drain_to_eot(), _raise_sq_err(); login PDU now
    forces database=None always; SQ_INFO PDU replayed verbatim from
    JDBC capture (offsets-indexed env-var format too gnarly to derive
    in MVP).
  src/informix_db/cursors.py — full rewrite: real PDU builders for
    PREPARE/CURNAME+NFETCH/NFETCH/CLOSE/RELEASE; tag-dispatched
    response readers; cursor-name generator matching JDBC's "_ifxc"
    convention.
  src/informix_db/_resultset.py — proper SQ_DESCRIBE parser per
    JDBC's receiveDescribe (USVER mode); offset-indexed string table
    with name lookup by fieldIndex; ColumnInfo dataclass with raw
    type-code preserved for null-flag extraction.
  src/informix_db/_messages.py — added SQ_NDESCRIBE=22, SQ_WANTDONE=49.

Test coverage: 40 unit + 15 integration tests (7 smoke + 8 new SELECT)
= 55 total, all green, ruff clean. New tests cover:
  - SELECT 1 returns (1,)
  - cursor.description shape per PEP 249
  - Multi-row INT SELECT
  - Multi-column mixed types (INT + FLOAT)
  - Iterator protocol (for row in cursor)
  - fetchmany(n)
  - Re-executing on same cursor resets state
  - Two cursors on one connection (sequential)

Known gap: VARCHAR row decoding doesn't yet handle the variable-width
on-wire encoding correctly. Phase 2.x will address — for now NotImpl
errors surface raw bytes in the row tuple.
2026-05-03 15:37:10 -06:00

157 lines
5.8 KiB
Python

"""Regression test: our generated login PDU is byte-identical to JDBC's.
Phase 1 polish artifact. We monkeypatch ``IfxSocket`` with a fake that
captures the bytes we send, then compare those bytes to the captured
JDBC reference PDU in ``docs/CAPTURES/01-connect-only.socat.log``.
Bytes 2..280 of the PDU are the *structural* prefix — SLheader (sans
length field), all login markers, the three capability ints, username,
password, protocol identifiers, and environment variables. These MUST
be byte-identical to JDBC's PDU; any divergence is a real bug (we
caught one this way already — the misaligned capability ints).
Bytes 280+ contain process-specific fields (PID, thread ID, hostname,
cwd, AppName) that legitimately differ per Python process. The test
asserts only the structural prefix.
"""
from __future__ import annotations
import re
from pathlib import Path
import pytest
import informix_db
from informix_db import connections
def _extract_first_client_pdu(log_path: Path) -> bytes:
"""Pull the first '>' (client→server) hex dump out of a socat -x log."""
text = log_path.read_text()
match = re.search(r"^> .*?length=\d+.*?\n (.*?)\n", text, re.MULTILINE | re.DOTALL)
assert match, f"no client→server message found in {log_path}"
return bytes.fromhex(match.group(1).strip().replace(" ", ""))
@pytest.fixture
def jdbc_reference_pdu() -> bytes:
"""The IBM JDBC reference login PDU, captured under socat in Phase 0."""
return _extract_first_client_pdu(
Path(__file__).parent.parent / "docs/CAPTURES/01-connect-only.socat.log"
)
@pytest.fixture
def python_login_pdu(monkeypatch: pytest.MonkeyPatch) -> bytes:
"""Capture the bytes our pure-Python client emits without touching the network."""
captured = bytearray()
class _CapturingSocket:
"""Fake socket: captures writes, then raises to stop the connect flow."""
def __init__(self, *_args: object, **_kwargs: object) -> None:
self._closed = False
@property
def closed(self) -> bool:
return self._closed
def write_all(self, data: bytes) -> None:
captured.extend(data)
# Stop the connect flow before it tries to read a server response.
raise informix_db.OperationalError("stub: stop after login PDU")
def read_exact(self, _n: int) -> bytes:
raise informix_db.OperationalError("stub: never reached")
def close(self) -> None:
self._closed = True
monkeypatch.setattr(connections, "IfxSocket", _CapturingSocket)
with pytest.raises(informix_db.OperationalError, match="stub"):
informix_db.connect(
host="dont.care",
port=9088,
user="informix",
password="in4mix",
database=None,
server="informix",
)
return bytes(captured)
# ---------------------------------------------------------------------------
# Structural-prefix tests
# ---------------------------------------------------------------------------
# Offset where process-specific fields begin (PID/TID/hostname/cwd/AppName).
# Empirically determined by running the diff after fixing the caps ints
# (see DECISION_LOG.md). Anything before this MUST match byte-for-byte.
STRUCTURAL_PREFIX_END = 280
def test_slheader_protocol_version_matches(
python_login_pdu: bytes, jdbc_reference_pdu: bytes
) -> None:
"""The SLheader's protocol-version byte (offset 3) must be 60 (PF_PROT_SQLI_0600)."""
assert python_login_pdu[3] == jdbc_reference_pdu[3] == 0x3C
def test_slheader_type_byte_matches(python_login_pdu: bytes, jdbc_reference_pdu: bytes) -> None:
"""The SLheader's slType byte (offset 2) must be 1 (SLTYPE_CONREQ)."""
assert python_login_pdu[2] == jdbc_reference_pdu[2] == 0x01
def test_capability_ints_match_reference(
python_login_pdu: bytes, jdbc_reference_pdu: bytes
) -> None:
"""Cap_1 / Cap_2 / Cap_3 (offsets 65..76) must be byte-identical to JDBC.
This is the test that would have caught the original capability-int bug
(where we sent caps_1=1, caps_2=0x3c000000 instead of caps_1=0x13c, caps_2=0).
"""
assert python_login_pdu[65:77] == jdbc_reference_pdu[65:77]
def test_structural_prefix_matches(python_login_pdu: bytes, jdbc_reference_pdu: bytes) -> None:
"""Everything from byte 2 to ``STRUCTURAL_PREFIX_END`` must match exactly.
Skips:
* Bytes 0..1 (SLheader length): differs because Python sends fewer
env vars / shorter AppName, so total length differs.
* Bytes ``STRUCTURAL_PREFIX_END``..end: process-specific fields
(PID, TID, hostname, cwd, AppName).
"""
py_prefix = python_login_pdu[2:STRUCTURAL_PREFIX_END]
ja_prefix = jdbc_reference_pdu[2:STRUCTURAL_PREFIX_END]
if py_prefix != ja_prefix:
# Find first divergence and report it with context.
for i, (a, b) in enumerate(zip(py_prefix, ja_prefix, strict=False)):
if a != b:
off = i + 2
pytest.fail(
f"structural-prefix mismatch at offset {off}: "
f"Python={a:#04x} JDBC={b:#04x}\n"
f" Python[{off - 4}..{off + 4}]: "
f"{python_login_pdu[off - 4 : off + 5].hex(' ')}\n"
f" JDBC [{off - 4}..{off + 4}]: "
f"{jdbc_reference_pdu[off - 4 : off + 5].hex(' ')}"
)
assert py_prefix == ja_prefix
def test_pdu_is_correctly_length_prefixed(python_login_pdu: bytes) -> None:
"""The SLheader's first 2 bytes must equal the total PDU length."""
declared_length = int.from_bytes(python_login_pdu[0:2], "big", signed=False)
assert declared_length == len(python_login_pdu)
def test_pdu_ends_with_sq_asceot(python_login_pdu: bytes) -> None:
"""Every login PDU must end with [short SQ_ASCEOT=127] (= 0x00 0x7f)."""
assert python_login_pdu[-2:] == b"\x00\x7f"