Initialize Phase 0 spike scaffold

Project goal: pure-Python implementation of the Informix SQLI wire
protocol. No CSDK, no JVM, no native deps. Targets icr.io/informix
/informix-developer-database (port 9088) as the dev/test instance.

Phase 0 is a documentation-only spike that gates all implementation
work. The four scaffolds:

- README.md: project status and Phase 0 deliverable index
- docs/PROTOCOL_NOTES.md: byte-level wire-format reference (TBD)
- docs/JDBC_NOTES.md: reverse-lookup index into the decompiled IBM
  JDBC driver (4.50.4.1), populated from build/jdbc-src/ once the
  decompile lands
- docs/DECISION_LOG.md: running rationale, with the Phase-1 paramstyle
  /Python-floor/autocommit decisions pre-locked so they don't churn
  later

CLAUDE.md is gitignored — operator-private context, public-PyPI repo.
This commit is contained in:
Ryan Malloy 2026-05-02 13:22:28 -06:00
commit f202dbce0c
5 changed files with 419 additions and 0 deletions

56
.gitignore vendored Normal file
View File

@ -0,0 +1,56 @@
# Project-private context (per global CLAUDE.md rule: only add to private repos)
CLAUDE.md
# Python
__pycache__/
*.py[cod]
*$py.class
*.so
.Python
build/
dist/
*.egg-info/
.eggs/
.installed.cfg
*.egg
# Virtual environments
.venv/
venv/
env/
.env
.env.*
!.env.example
# uv
.python-version
# IDE / editor
.vscode/
.idea/
*.swp
*.swo
*~
# Test / lint caches
.pytest_cache/
.ruff_cache/
.mypy_cache/
.coverage
.coverage.*
htmlcov/
# OS
.DS_Store
Thumbs.db
# Phase 0 build artifacts (decompiled JDBC, downloaded JARs)
# build/ already excluded by Python pattern above; we keep the directory for spike work
build/jdbc-src/
build/*.jar
# Wireshark captures live IN the repo intentionally — they're spike deliverables.
# Do NOT add docs/CAPTURES/ here.
# Java reference client build outputs
*.class

26
README.md Normal file
View File

@ -0,0 +1,26 @@
# informix-db
Pure-Python driver for IBM Informix IDS, speaking the SQLI wire protocol over raw sockets. **No IBM Client SDK. No JVM. No native libraries.**
## Status
🚧 **Phase 0 — Spike.** Characterizing the SQLI wire protocol. No library code yet.
The protocol has never been published byte-for-byte by IBM. Every existing Informix driver in every language wraps either IBM's CSDK or the JDBC JAR. This project closes that gap.
## Phase 0 deliverables
- [`docs/PROTOCOL_NOTES.md`](docs/PROTOCOL_NOTES.md) — byte-level wire-format reference, derived from packet captures + JDBC decompilation
- [`docs/JDBC_NOTES.md`](docs/JDBC_NOTES.md) — index into the decompiled IBM JDBC driver's wire-protocol classes
- [`docs/DECISION_LOG.md`](docs/DECISION_LOG.md) — running rationale for protocol/auth/type decisions
- `docs/CAPTURES/*.pcap` — annotated packet captures of reference exchanges
If Phase 0's exit criteria are met, library implementation begins in Phase 1.
## Test target
`icr.io/informix/informix-developer-database` (port 9088, native SQLI). See [`tests/docker-compose.yml`](tests/docker-compose.yml) once Phase 1 lands.
## License
MIT.

108
docs/DECISION_LOG.md Normal file
View File

@ -0,0 +1,108 @@
# Decision Log
Running rationale for protocol, auth, type, and architecture decisions made during the project. New decisions append; old ones are *amended* (with date) rather than overwritten.
Format: every decision has a date, a status (`active` / `superseded` / `revisited`), the chosen path, the discarded alternatives, and the *why*.
---
## 2026-05-02 — Project goal & off-ramp
**Status**: active
**Decision**: Build a pure-Python implementation of the SQLI wire protocol. No IBM Client SDK. No JVM. No native libraries.
**Off-ramp** (chosen by user during planning): if Phase 0 reveals the protocol is intractable in pure Python — e.g., mandatory undocumented crypto in the handshake — narrow scope (lock to one server version, drop async, drop prepared statements if needed) and stay pure-Python. Do **not** fall back to JPype/JDBC; that defeats the project's purpose.
**Why**: The "no SDK / no JVM" goal is what makes this driver valuable. A JPype fallback would ship something that works but solves nothing the existing JDBC-via-JPype solution doesn't already solve.
---
## 2026-05-02 — Package name
**Status**: active
**Decision**: `informix-db`
**Discarded**: `informixdb-pure` (longer), `ifxsqli` (less discoverable), `pyifx` (obscure)
**PyPI availability**: confirmed available 2026-05-02 (HTTP 404 on `/pypi/informix-db/json`). The legacy `informixdb` is taken (HTTP 200), `informix` is also free (404) but too generic.
**Why**: Discoverability balanced with brevity. Anyone searching PyPI for "informix" finds it; the hyphen distinguishes it from the legacy C-extension wrapper.
---
## 2026-05-02 — License
**Status**: active
**Decision**: MIT
**Discarded**: Apache-2.0 (more defensive but less common in Python ecosystem), BSD-3-Clause
**Why**: Simplest, most permissive, ecosystem-standard for Python libraries.
---
## 2026-05-02 — Sync first; async deferred
**Status**: active
**Decision**: Build a sync, blocking-socket implementation. Async lands in Phase 6+ as a separate `informix_db.aio` subpackage following asyncpg's I/O-agnostic-protocol pattern.
**Why**: Wire protocols are hard enough; debugging protocol bugs through asyncio plumbing is two layers of indirection too many. Sync-first means we can test against blocking sockets, prove correctness, then mechanically swap the I/O layer.
---
## 2026-05-02 — Test target
**Status**: active
**Decision**: `icr.io/informix/informix-developer-database` (the IBM Informix Developer Edition image), port 9088 (native SQLI).
**Why**: Free, official, no license click-through, supports plain-password auth out of the box. Pinning the digest (not `:latest`) is a Phase 1 requirement.
---
## 2026-05-02 — Phase 0 is a gate, not a step
**Status**: active
**Decision**: No library code is written until `PROTOCOL_NOTES.md` meets all four exit criteria:
1. Login byte layout documented end-to-end
2. Message-type tags identified for login/execute/row/end-of-result/error/disconnect
3. `SELECT 1` round-trip fully labeled
4. JDBC source and packet capture corroborate on login + execute paths
If exit criteria can't be met within bounded effort, invoke the off-ramp.
**Why**: Most greenfield projects fail by writing code before they understand the problem. This project has an undocumented wire protocol as its central unknown. Gating on Phase 0 means a failed spike still produces a publicly valuable artifact (`PROTOCOL_NOTES.md`) instead of a half-built driver.
---
## 2026-05-02 — Phase 1 architecture decisions (locked at start of Phase 1)
> These are pre-decided so paramstyle/Python-floor/autocommit don't churn later. Recorded here so Phase 1 doesn't relitigate them.
- **`paramstyle = "numeric"`** (`:1`, `:2`, …). Matches Informix ESQL/C convention.
- **Python ≥ 3.10**. Gives us `match`, modern type hints, `tomllib`.
- **`autocommit` defaults to off**. PEP 249 implicit semantics; opt-in via `connect(autocommit=True)`.
- **Author**: Ryan Malloy `<ryan@supported.systems>` (per global pyproject.toml convention).
- **Versioning**: CalVer `YYYY.MM.DD` (`2026.05.02` initial); same-day fixes use PEP 440 post-release `2026.05.02.1`, `.2`, etc.
---
## 2026-05-02 — DATE pulled forward to MVP
**Status**: active
**Decision**: DATE is included in the Phase 2 MVP type set, alongside SMALLINT/INTEGER/BIGINT/FLOAT/CHAR/VARCHAR/BOOLEAN.
**Discarded**: leaving DATE in the "medium" / Phase 6 bucket.
**Why**: Almost no real Informix database is DATE-free. The encoding is trivial once the type code is known (4-byte day count from the Informix epoch 1899-12-31). Cheap to include; expensive to leave out.
DATETIME / INTERVAL / DECIMAL / NUMERIC / MONEY remain in Phase 6+ — their encodings (qualifier-byte precision, BCD-style packed decimal) are non-trivial.
---
## 2026-05-02 — `CLAUDE.md` excluded from git and sdist
**Status**: active
**Decision**: `.gitignore` excludes `CLAUDE.md`. Once `pyproject.toml` exists, `[tool.hatch.build.targets.sdist].exclude` will also list `CLAUDE.md`.
**Why**: `CLAUDE.md` contains the user's email and operator-private context. Per global convention, only commit `CLAUDE.md` to private repos. This project is destined for PyPI / public Git.
---
## (template — copy below this line for new entries)
```
## YYYY-MM-DD — <one-line decision title>
**Status**: active | superseded | revisited
**Decision**: <chosen path>
**Discarded**: <alternatives, briefly>
**Why**: <rationale>
```

66
docs/JDBC_NOTES.md Normal file
View File

@ -0,0 +1,66 @@
# IBM JDBC Driver — Wire Protocol Class Index
> **Phase 0 spike artifact.** Reverse-lookup index into the decompiled `com.ibm.informix:jdbc:4.50.4.1` JAR. This document tells us which Java class to read when we want to understand how the JDBC driver implements a given wire-protocol concern.
>
> **Legal note**: the decompiled source lives in `build/jdbc-src/` and is **not committed to this repository**. It is consulted as a clean-room understanding reference only. The Python implementation in `src/informix_db/` is written from `PROTOCOL_NOTES.md` (which cites observed packet bytes), not from the Java source.
## Decompilation
```bash
# Get the JAR
curl -O https://repo1.maven.org/maven2/com/ibm/informix/jdbc/4.50.4.1/jdbc-4.50.4.1.jar
# Decompile (CFR — https://www.benf.org/other/cfr/)
java -jar cfr.jar jdbc-4.50.4.1.jar --outputdir build/jdbc-src/
```
Driver version: `4.50.4.1` (latest as of 2026-05-02 on Maven Central).
## Top-level package layout
TBD — populate after decompilation.
Expected (from research):
- `com.informix.jdbc` — driver entry, connection, statement, result-set
- Likely subpackages for protocol I/O, type system, error mapping
## Class index (responsibility → class)
| Concern | Class | File path under `build/jdbc-src/` | Notes |
|---------|-------|------------------------------------|-------|
| Driver entry point | `com.informix.jdbc.IfxDriver` | TBD | implements `java.sql.Driver` |
| Connection | `com.informix.jdbc.IfxConnection` | TBD | extends `java.sql.Connection` |
| Wire socket I/O | TBD | TBD | look for `DataOutputStream` / `DataInputStream` users |
| Message framing | TBD | TBD | length-prefix + type-tag handlers |
| Login handshake | TBD | TBD | username/password/database selection |
| Auth method dispatch | TBD | TBD | plain / obfuscated / GSSAPI |
| Statement execute | TBD | TBD | EXECUTE / EXECUTE IMMEDIATE entry points |
| Prepared statement | TBD | TBD | parameter descriptors |
| Result-set parsing | TBD | TBD | column descriptors + row decoding |
| Type codecs (encoders) | TBD | TBD | `IfxTypeId` likely; per-type encoder methods |
| Type codecs (decoders) | TBD | TBD | per-type decoder methods |
| Error decoding (SQLSTATE) | TBD | TBD | error-message → SQLException mapping |
| Disconnection | TBD | TBD | logout / socket close |
| Protocol trace | `com.informix.jdbc.*.getProtoTrace` | TBD | Useful debug hook; understand what it logs |
## Method-level pointers
> As we identify specific methods that map to specific wire bytes, record them here. Format: `Class#method() → wire effect`.
- _(none yet)_
## Things to grep for
```bash
# Wire I/O entry points
grep -rln "DataOutputStream\|DataInputStream" build/jdbc-src/
# Type code constants
grep -rln "TYPEID\|IfxTypeId\|TypeId" build/jdbc-src/
# Auth method strings
grep -rln "OBFUSCATE\|PWDOBFUSCATION\|GSS\|KERBEROS" build/jdbc-src/
# SQLSTATE / error mapping
grep -rln "SQLSTATE\|SQLException" build/jdbc-src/
```

163
docs/PROTOCOL_NOTES.md Normal file
View File

@ -0,0 +1,163 @@
# SQLI Wire Protocol Notes
> **Phase 0 spike artifact.** This is the byte-level reference document for the Informix SQLI wire protocol, derived from a combination of packet captures against the IBM Informix Developer Edition Docker image and clean-room study of the decompiled IBM JDBC driver (`com.ibm.informix:jdbc:4.50.4.1`). It is the canonical reference that all subsequent implementation phases depend on.
>
> **Current state**: scaffold only. Sections fill in as the spike proceeds.
---
## Source attribution conventions
Each documented byte sequence cites both sources of evidence:
- 🔵 **PCAP**: observed in `docs/CAPTURES/<file>.pcap` at offset `<n>`
- 🟡 **JDBC**: cross-referenced against `<class>.<method>()` in the decompiled tree (see `JDBC_NOTES.md`)
A finding is considered *confirmed* only when 🔵 and 🟡 corroborate. Single-source observations are flagged 🟠 *unverified*.
---
## 1. Connection establishment
### TCP setup
- Port: 9088 (SQLI native, default)
- Protocol: TCP, no TLS in plain mode
- Who speaks first: TBD
### Initial banner / capability exchange
TBD
---
## 2. Login sequence
### Message ordering
TBD
### Login packet structure
| Offset | Width | Field | Notes |
|--------|-------|-------|-------|
| TBD | TBD | TBD | TBD |
### Username encoding
TBD
### Password encoding (plain auth, no obfuscation)
TBD
### Database selection
TBD (during login, or separate USE-DATABASE message?)
### Server response on success
TBD
### Server response on auth failure
TBD
---
## 3. Message framing
### Header layout
TBD — fields: type tag, length, flags?
### Length field
- Width: TBD bytes
- Endianness: TBD
- Value semantics: payload-only or whole-message?
### Endianness (overall)
TBD
### Message type tags
| Tag (hex) | Direction | Name | Purpose |
|-----------|-----------|------|---------|
| TBD | TBD | TBD | TBD |
---
## 4. Statement execution: `SELECT 1`
### Request
TBD
### Response
TBD
### Type code observed for the literal `1`
TBD
---
## 5. Result-set framing
### Column descriptor block
TBD — fields per column: name, type code, precision/scale, nullability flag, …
### Row encoding
TBD — fixed-position fields? null bitmap? per-field length prefix?
### End-of-result marker
TBD
---
## 6. Error responses
### Error packet format
TBD — fields: SQLSTATE, native error code, message text
### Encoding
TBD
---
## 7. Disconnection
### Client→server logout message
TBD
### Server-side close behavior
TBD
---
## 8. Type codecs
### IDS type codes observed in column descriptors
| Code (decimal/hex) | IDS Type | Wire format | Notes |
|--------------------|----------|-------------|-------|
| TBD | SMALLINT | TBD | |
| TBD | INTEGER | TBD | |
| TBD | BIGINT | TBD | |
| TBD | FLOAT | TBD | |
| TBD | CHAR | TBD | |
| TBD | VARCHAR | TBD | |
| TBD | BOOLEAN | TBD | |
| TBD | DATE | TBD | 4-byte day count from 1899-12-31 (Informix epoch); confirm |
(DATETIME, INTERVAL, DECIMAL, BLOBs etc. are out of scope for Phase 0; see `DECISION_LOG.md`.)
---
## 9. Open questions
> List things observed in JDBC source or packet captures that we don't yet understand. Each entry is either resolved-and-removed or escalated to `DECISION_LOG.md` as a deferred item.
- _(none yet)_
---
## 10. Cross-checks
### JDBC ↔ PCAP corroboration matrix
| Phase 0 milestone | JDBC source confirms | PCAP confirms | Status |
|-------------------|----------------------|---------------|--------|
| Login byte layout | ⬜ | ⬜ | pending |
| `SELECT 1` round-trip | ⬜ | ⬜ | pending |
| Error response structure | ⬜ | ⬜ | pending |
| Disconnection | ⬜ | ⬜ | pending |
Phase 0 exit requires all four rows = ✅✅ confirmed.