Phase 6.f: BYTE/TEXT/BLOB/CLOB protocol research (deferred to Phase 8+)
Empirical and source-level investigation of the LOB type families. Findings: * BYTE/TEXT (type 11/12) cannot be inserted via SQL literals — even dbaccess with `INSERT INTO t VALUES (1, "0x...")` returns -617 "A blob data type must be supplied within this context". The server requires a binary BBIND wire path. Hard restriction. * BYTE/TEXT wire protocol: SQ_BIND sends a 56-byte descriptor as the inline placeholder, then a separate SQ_BBIND (41) PDU declares blob count, then chunked SQ_BLOB (39) tags stream the actual bytes (max 1024 bytes/chunk per JDBC's sendStreamBlob). * BLOB/CLOB (type 101/102) are even more involved — smart-LOBs use an LO_OPEN/LO_READ/LO_WRITE/LO_CLOSE session protocol against sbspace, with locators carried inline in SQ_TUPLE. * Server-side setup confirmed working: blobspace1 + sbspace1 + logged database (testdb) are now available in the dev container for future Phase 8/9 implementation. Both LOB families require materially more state-machine work than the single-PDU codec types (DECIMAL/DATETIME/INTERVAL). Splitting into Phase 8 (BYTE/TEXT) and Phase 9 (BLOB/CLOB) lets each get focused attention rather than half-implementing both. The SQ_BBIND, SQ_BLOB, SQ_FETCHBLOB, SQ_SBBIND, SQ_FILE_READ, SQ_FILE_WRITE constants are already declared in _messages.py from Phase 1 scaffolding — protocol layer is ready when implementation lands. For users who need binary data <32K today: LVARCHAR via str encoded with iso-8859-1 is a viable interim path.
This commit is contained in:
parent
888b8079d3
commit
f546f951c8
@ -469,6 +469,66 @@ INTERVAL parameter binding (encoder) is deferred to Phase 6.e or later — same
|
|||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
|
## 2026-05-04 — Phase 6.f research: BYTE / TEXT / BLOB / CLOB protocol scope
|
||||||
|
|
||||||
|
**Status**: research complete; implementation deferred
|
||||||
|
**Decision**: Decoupling LOB types into their own phase. The four "LOB" types split into two protocol families with materially different wire-level cost:
|
||||||
|
|
||||||
|
### Protocol family A: BYTE (type=11) and TEXT (type=12) — legacy in-row-pointed blobs
|
||||||
|
|
||||||
|
**Server-side requirements** (verified empirically against the IBM dev container 15.0.1.0.3DE):
|
||||||
|
- A blobspace must exist (`onspaces -c -b blobspace1 -p ... -o 0 -s 50000`)
|
||||||
|
- The database must be logged (`CREATE DATABASE testdb WITH LOG`)
|
||||||
|
- The column declaration must place data in the blobspace: `data BYTE IN blobspace1`
|
||||||
|
|
||||||
|
**Even with all that, BYTE/TEXT cannot be inserted via SQL literals.** I verified by running `dbaccess - test_byte.sql` with `INSERT INTO t VALUES (1, "0x68656c6c6f")` and getting:
|
||||||
|
|
||||||
|
```
|
||||||
|
617: A blob data type must be supplied within this context.
|
||||||
|
```
|
||||||
|
|
||||||
|
This is a hard server-side restriction: blob data **must** arrive via the binary BBIND wire path. There is no string-literal escape hatch.
|
||||||
|
|
||||||
|
**Wire protocol** (per `IfxSqli.sendBind` line 844, `sendBlob` line 3328, `sendStreamBlob` line 3482):
|
||||||
|
|
||||||
|
1. **SQ_BIND** (tag 5): per-param block declares the BYTE/TEXT slot but the inline data is a **56-byte blob descriptor** (per `IfxBlob.toIfx` line 162) — mostly zeros, with the size at offset [16:20] as a 4-byte big-endian int. Byte 39 is the null indicator (1 = null).
|
||||||
|
2. **SQ_BBIND** (tag 41): `[short tag=41][short blob_count]` — the count of BYTE/TEXT params being streamed.
|
||||||
|
3. **For each BYTE/TEXT param**: stream of `SQ_BLOB` (tag 39) chunks: `[short tag=39][short length][padded data]`. Chunks max out at 1024 bytes per `sendStreamBlob`.
|
||||||
|
4. **End-of-blob marker**: a final `SQ_BLOB` with `[short tag=39][short length=0]`.
|
||||||
|
5. Then SQ_EXECUTE proceeds normally.
|
||||||
|
|
||||||
|
**Decoder side**: rows containing BYTE/TEXT have a 56-byte descriptor in the SQ_TUPLE payload (per `IfxRowColumn.loadColumnData` switch case for type 11/12 reading 56 bytes). Then a separate stream of SQ_BLOB tags arrives **between** SQ_TUPLE messages, carrying the actual bytes.
|
||||||
|
|
||||||
|
**Estimated implementation cost**: substantial. Cursor state machine needs to:
|
||||||
|
- Detect `bytes`/`str`-meant-as-TEXT params and route them through SQ_BBIND after SQ_BIND
|
||||||
|
- Send the 56-byte descriptor as the inline placeholder
|
||||||
|
- Stream chunks ≤1024 bytes each
|
||||||
|
- On the read path, parse SQ_BLOB tags between SQ_TUPLE messages and reassemble per-column
|
||||||
|
|
||||||
|
This is a multi-day effort and warrants its own phase, **Phase 7+**.
|
||||||
|
|
||||||
|
### Protocol family B: BLOB (type=102) and CLOB (type=101) — smart-LOBs with locators
|
||||||
|
|
||||||
|
**Server-side requirements**: an sbspace (smart-LOB space), more complex than blobspace. (Verified: `onspaces -c -S sbspace1 ...`).
|
||||||
|
|
||||||
|
**Wire protocol**: even more involved than BYTE/TEXT. Per `IfxLobInputStream` and `IfxSmartBlob`, smart-LOB access uses an LO_OPEN/LO_READ/LO_WRITE/LO_CLOSE session protocol against the sbspace, with handles called *locators* that travel inline in the SQ_TUPLE while the actual bytes go over a separate channel. JDBC's `IfxLocator` is a 56-byte descriptor (same shape as the BYTE descriptor!) but carries semantic meaning: storage type, sbspace ID, partition number, etc.
|
||||||
|
|
||||||
|
**Estimated implementation cost**: substantial++ — significantly larger than BYTE/TEXT, because we'd need to implement the LO_* RPC sub-protocol entirely.
|
||||||
|
|
||||||
|
### Decision
|
||||||
|
|
||||||
|
**Phase 6.f is closed as research-complete** with this entry as the deliverable. The findings replace assumptions (e.g., "BLOB/CLOB will be similar to INTERVAL") with actual protocol facts. Implementation is split into:
|
||||||
|
- **Phase 8** (future): BYTE/TEXT bind+read with the SQ_BBIND/SQ_BLOB wire machinery
|
||||||
|
- **Phase 9** (future): smart-LOB BLOB/CLOB with the LO_OPEN/LO_READ session protocol
|
||||||
|
|
||||||
|
In the meantime, **users who need to insert binary data** can use the existing `LVARCHAR` path via `str` (works for binary if encoded with `iso-8859-1`) up to ~32K — which is the LVARCHAR on-wire limit. Not a substitute for true BYTE/TEXT but covers many practical cases.
|
||||||
|
|
||||||
|
The constants `SQ_BBIND=41`, `SQ_BLOB=39`, `SQ_FETCHBLOB=38`, `SQ_SBBIND=52`, `SQ_FILE_READ=106`, `SQ_FILE_WRITE=107` are already declared in `_messages.py` from earlier scaffolding — the protocol layer is ready when implementation lands.
|
||||||
|
|
||||||
|
**Honest scope-discovery moment**: I went into Phase 6.f assuming it'd be similar effort to INTERVAL. Reading the wire protocol revealed a different shape entirely — multi-PDU sequences require state-machine surgery, not just new codecs. Pivoting now (instead of half-implementing) is the right call.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
## (template — copy below this line for new entries)
|
## (template — copy below this line for new entries)
|
||||||
|
|
||||||
```
|
```
|
||||||
|
|||||||
Loading…
x
Reference in New Issue
Block a user