Phase 6.f: BYTE/TEXT/BLOB/CLOB protocol research (deferred to Phase 8+)

Empirical and source-level investigation of the LOB type families.
Findings:

* BYTE/TEXT (type 11/12) cannot be inserted via SQL literals — even
  dbaccess with `INSERT INTO t VALUES (1, "0x...")` returns -617
  "A blob data type must be supplied within this context". The server
  requires a binary BBIND wire path. Hard restriction.

* BYTE/TEXT wire protocol: SQ_BIND sends a 56-byte descriptor as the
  inline placeholder, then a separate SQ_BBIND (41) PDU declares blob
  count, then chunked SQ_BLOB (39) tags stream the actual bytes (max
  1024 bytes/chunk per JDBC's sendStreamBlob).

* BLOB/CLOB (type 101/102) are even more involved — smart-LOBs use an
  LO_OPEN/LO_READ/LO_WRITE/LO_CLOSE session protocol against sbspace,
  with locators carried inline in SQ_TUPLE.

* Server-side setup confirmed working: blobspace1 + sbspace1 + logged
  database (testdb) are now available in the dev container for future
  Phase 8/9 implementation.

Both LOB families require materially more state-machine work than the
single-PDU codec types (DECIMAL/DATETIME/INTERVAL). Splitting into
Phase 8 (BYTE/TEXT) and Phase 9 (BLOB/CLOB) lets each get focused
attention rather than half-implementing both.

The SQ_BBIND, SQ_BLOB, SQ_FETCHBLOB, SQ_SBBIND, SQ_FILE_READ,
SQ_FILE_WRITE constants are already declared in _messages.py from
Phase 1 scaffolding — protocol layer is ready when implementation
lands.

For users who need binary data <32K today: LVARCHAR via str encoded
with iso-8859-1 is a viable interim path.
This commit is contained in:
Ryan Malloy 2026-05-04 12:37:46 -06:00
parent 888b8079d3
commit f546f951c8

View File

@ -469,6 +469,66 @@ INTERVAL parameter binding (encoder) is deferred to Phase 6.e or later — same
---
## 2026-05-04 — Phase 6.f research: BYTE / TEXT / BLOB / CLOB protocol scope
**Status**: research complete; implementation deferred
**Decision**: Decoupling LOB types into their own phase. The four "LOB" types split into two protocol families with materially different wire-level cost:
### Protocol family A: BYTE (type=11) and TEXT (type=12) — legacy in-row-pointed blobs
**Server-side requirements** (verified empirically against the IBM dev container 15.0.1.0.3DE):
- A blobspace must exist (`onspaces -c -b blobspace1 -p ... -o 0 -s 50000`)
- The database must be logged (`CREATE DATABASE testdb WITH LOG`)
- The column declaration must place data in the blobspace: `data BYTE IN blobspace1`
**Even with all that, BYTE/TEXT cannot be inserted via SQL literals.** I verified by running `dbaccess - test_byte.sql` with `INSERT INTO t VALUES (1, "0x68656c6c6f")` and getting:
```
617: A blob data type must be supplied within this context.
```
This is a hard server-side restriction: blob data **must** arrive via the binary BBIND wire path. There is no string-literal escape hatch.
**Wire protocol** (per `IfxSqli.sendBind` line 844, `sendBlob` line 3328, `sendStreamBlob` line 3482):
1. **SQ_BIND** (tag 5): per-param block declares the BYTE/TEXT slot but the inline data is a **56-byte blob descriptor** (per `IfxBlob.toIfx` line 162) — mostly zeros, with the size at offset [16:20] as a 4-byte big-endian int. Byte 39 is the null indicator (1 = null).
2. **SQ_BBIND** (tag 41): `[short tag=41][short blob_count]` — the count of BYTE/TEXT params being streamed.
3. **For each BYTE/TEXT param**: stream of `SQ_BLOB` (tag 39) chunks: `[short tag=39][short length][padded data]`. Chunks max out at 1024 bytes per `sendStreamBlob`.
4. **End-of-blob marker**: a final `SQ_BLOB` with `[short tag=39][short length=0]`.
5. Then SQ_EXECUTE proceeds normally.
**Decoder side**: rows containing BYTE/TEXT have a 56-byte descriptor in the SQ_TUPLE payload (per `IfxRowColumn.loadColumnData` switch case for type 11/12 reading 56 bytes). Then a separate stream of SQ_BLOB tags arrives **between** SQ_TUPLE messages, carrying the actual bytes.
**Estimated implementation cost**: substantial. Cursor state machine needs to:
- Detect `bytes`/`str`-meant-as-TEXT params and route them through SQ_BBIND after SQ_BIND
- Send the 56-byte descriptor as the inline placeholder
- Stream chunks ≤1024 bytes each
- On the read path, parse SQ_BLOB tags between SQ_TUPLE messages and reassemble per-column
This is a multi-day effort and warrants its own phase, **Phase 7+**.
### Protocol family B: BLOB (type=102) and CLOB (type=101) — smart-LOBs with locators
**Server-side requirements**: an sbspace (smart-LOB space), more complex than blobspace. (Verified: `onspaces -c -S sbspace1 ...`).
**Wire protocol**: even more involved than BYTE/TEXT. Per `IfxLobInputStream` and `IfxSmartBlob`, smart-LOB access uses an LO_OPEN/LO_READ/LO_WRITE/LO_CLOSE session protocol against the sbspace, with handles called *locators* that travel inline in the SQ_TUPLE while the actual bytes go over a separate channel. JDBC's `IfxLocator` is a 56-byte descriptor (same shape as the BYTE descriptor!) but carries semantic meaning: storage type, sbspace ID, partition number, etc.
**Estimated implementation cost**: substantial++ — significantly larger than BYTE/TEXT, because we'd need to implement the LO_* RPC sub-protocol entirely.
### Decision
**Phase 6.f is closed as research-complete** with this entry as the deliverable. The findings replace assumptions (e.g., "BLOB/CLOB will be similar to INTERVAL") with actual protocol facts. Implementation is split into:
- **Phase 8** (future): BYTE/TEXT bind+read with the SQ_BBIND/SQ_BLOB wire machinery
- **Phase 9** (future): smart-LOB BLOB/CLOB with the LO_OPEN/LO_READ session protocol
In the meantime, **users who need to insert binary data** can use the existing `LVARCHAR` path via `str` (works for binary if encoded with `iso-8859-1`) up to ~32K — which is the LVARCHAR on-wire limit. Not a substitute for true BYTE/TEXT but covers many practical cases.
The constants `SQ_BBIND=41`, `SQ_BLOB=39`, `SQ_FETCHBLOB=38`, `SQ_SBBIND=52`, `SQ_FILE_READ=106`, `SQ_FILE_WRITE=107` are already declared in `_messages.py` from earlier scaffolding — the protocol layer is ready when implementation lands.
**Honest scope-discovery moment**: I went into Phase 6.f assuming it'd be similar effort to INTERVAL. Reading the wire protocol revealed a different shape entirely — multi-PDU sequences require state-machine surgery, not just new codecs. Pivoting now (instead of half-implementing) is the right call.
---
## (template — copy below this line for new entries)
```