Phase 6.f: BYTE/TEXT/BLOB/CLOB protocol research (deferred to Phase 8+)

Empirical and source-level investigation of the LOB type families. Findings: * BYTE/TEXT (type 11/12) cannot be inserted via SQL literals — even dbaccess with `INSERT INTO t VALUES (1, "0x...")` returns -617 "A blob data type must be supplied within this context". The server requires a binary BBIND wire path. Hard restriction. * BYTE/TEXT wire protocol: SQ_BIND sends a 56-byte descriptor as the inline placeholder, then a separate SQ_BBIND (41) PDU declares blob count, then chunked SQ_BLOB (39) tags stream the actual bytes (max 1024 bytes/chunk per JDBC's sendStreamBlob). * BLOB/CLOB (type 101/102) are even more involved — smart-LOBs use an LO_OPEN/LO_READ/LO_WRITE/LO_CLOSE session protocol against sbspace, with locators carried inline in SQ_TUPLE. * Server-side setup confirmed working: blobspace1 + sbspace1 + logged database (testdb) are now available in the dev container for future Phase 8/9 implementation. Both LOB families require materially more state-machine work than the single-PDU codec types (DECIMAL/DATETIME/INTERVAL). Splitting into Phase 8 (BYTE/TEXT) and Phase 9 (BLOB/CLOB) lets each get focused attention rather than half-implementing both. The SQ_BBIND, SQ_BLOB, SQ_FETCHBLOB, SQ_SBBIND, SQ_FILE_READ, SQ_FILE_WRITE constants are already declared in _messages.py from Phase 1 scaffolding — protocol layer is ready when implementation lands. For users who need binary data <32K today: LVARCHAR via str encoded with iso-8859-1 is a viable interim path.
2026-05-04 12:37:46 -06:00 · 2026-05-04 12:37:46 -06:00 · f546f951c8
commit f546f951c8
parent 888b8079d3
1 changed files with 60 additions and 0 deletions
--- a/docs/DECISION_LOG.md
+++ b/docs/DECISION_LOG.md
@ -469,6 +469,66 @@ INTERVAL parameter binding (encoder) is deferred to Phase 6.e or later — same

 ---

+## 2026-05-04 — Phase 6.f research: BYTE / TEXT / BLOB / CLOB protocol scope
+
+**Status**: research complete; implementation deferred
+**Decision**: Decoupling LOB types into their own phase. The four "LOB" types split into two protocol families with materially different wire-level cost:
+
+### Protocol family A: BYTE (type=11) and TEXT (type=12) — legacy in-row-pointed blobs
+
+**Server-side requirements** (verified empirically against the IBM dev container 15.0.1.0.3DE):
+- A blobspace must exist (`onspaces -c -b blobspace1 -p ... -o 0 -s 50000`)
+- The database must be logged (`CREATE DATABASE testdb WITH LOG`)
+- The column declaration must place data in the blobspace: `data BYTE IN blobspace1`
+
+**Even with all that, BYTE/TEXT cannot be inserted via SQL literals.** I verified by running `dbaccess - test_byte.sql` with `INSERT INTO t VALUES (1, "0x68656c6c6f")` and getting:
+
+```
+617: A blob data type must be supplied within this context.
+```
+
+This is a hard server-side restriction: blob data **must** arrive via the binary BBIND wire path. There is no string-literal escape hatch.
+
+**Wire protocol** (per `IfxSqli.sendBind` line 844, `sendBlob` line 3328, `sendStreamBlob` line 3482):
+
+1. **SQ_BIND** (tag 5): per-param block declares the BYTE/TEXT slot but the inline data is a **56-byte blob descriptor** (per `IfxBlob.toIfx` line 162) — mostly zeros, with the size at offset [16:20] as a 4-byte big-endian int. Byte 39 is the null indicator (1 = null).
+2. **SQ_BBIND** (tag 41): `[short tag=41][short blob_count]` — the count of BYTE/TEXT params being streamed.
+3. **For each BYTE/TEXT param**: stream of `SQ_BLOB` (tag 39) chunks: `[short tag=39][short length][padded data]`. Chunks max out at 1024 bytes per `sendStreamBlob`.
+4. **End-of-blob marker**: a final `SQ_BLOB` with `[short tag=39][short length=0]`.
+5. Then SQ_EXECUTE proceeds normally.
+
+**Decoder side**: rows containing BYTE/TEXT have a 56-byte descriptor in the SQ_TUPLE payload (per `IfxRowColumn.loadColumnData` switch case for type 11/12 reading 56 bytes). Then a separate stream of SQ_BLOB tags arrives **between** SQ_TUPLE messages, carrying the actual bytes.
+
+**Estimated implementation cost**: substantial. Cursor state machine needs to:
+- Detect `bytes`/`str`-meant-as-TEXT params and route them through SQ_BBIND after SQ_BIND
+- Send the 56-byte descriptor as the inline placeholder
+- Stream chunks ≤1024 bytes each
+- On the read path, parse SQ_BLOB tags between SQ_TUPLE messages and reassemble per-column
+
+This is a multi-day effort and warrants its own phase, **Phase 7+**.
+
+### Protocol family B: BLOB (type=102) and CLOB (type=101) — smart-LOBs with locators
+
+**Server-side requirements**: an sbspace (smart-LOB space), more complex than blobspace. (Verified: `onspaces -c -S sbspace1 ...`).
+
+**Wire protocol**: even more involved than BYTE/TEXT. Per `IfxLobInputStream` and `IfxSmartBlob`, smart-LOB access uses an LO_OPEN/LO_READ/LO_WRITE/LO_CLOSE session protocol against the sbspace, with handles called *locators* that travel inline in the SQ_TUPLE while the actual bytes go over a separate channel. JDBC's `IfxLocator` is a 56-byte descriptor (same shape as the BYTE descriptor!) but carries semantic meaning: storage type, sbspace ID, partition number, etc.
+
+**Estimated implementation cost**: substantial++ — significantly larger than BYTE/TEXT, because we'd need to implement the LO_* RPC sub-protocol entirely.
+
+### Decision
+
+**Phase 6.f is closed as research-complete** with this entry as the deliverable. The findings replace assumptions (e.g., "BLOB/CLOB will be similar to INTERVAL") with actual protocol facts. Implementation is split into:
+- **Phase 8** (future): BYTE/TEXT bind+read with the SQ_BBIND/SQ_BLOB wire machinery
+- **Phase 9** (future): smart-LOB BLOB/CLOB with the LO_OPEN/LO_READ session protocol
+
+In the meantime, **users who need to insert binary data** can use the existing `LVARCHAR` path via `str` (works for binary if encoded with `iso-8859-1`) up to ~32K — which is the LVARCHAR on-wire limit. Not a substitute for true BYTE/TEXT but covers many practical cases.
+
+The constants `SQ_BBIND=41`, `SQ_BLOB=39`, `SQ_FETCHBLOB=38`, `SQ_SBBIND=52`, `SQ_FILE_READ=106`, `SQ_FILE_WRITE=107` are already declared in `_messages.py` from earlier scaffolding — the protocol layer is ready when implementation lands.
+
+**Honest scope-discovery moment**: I went into Phase 6.f assuming it'd be similar effort to INTERVAL. Reading the wire protocol revealed a different shape entirely — multi-PDU sequences require state-machine surgery, not just new codecs. Pivoting now (instead of half-implementing) is the right call.
+
+---
+
 ## (template — copy below this line for new entries)

 ```