commit f202dbce0c1209101afbf146e2e916fb709afe07 Author: Ryan Malloy Date: Sat May 2 13:22:28 2026 -0600 Initialize Phase 0 spike scaffold Project goal: pure-Python implementation of the Informix SQLI wire protocol. No CSDK, no JVM, no native deps. Targets icr.io/informix /informix-developer-database (port 9088) as the dev/test instance. Phase 0 is a documentation-only spike that gates all implementation work. The four scaffolds: - README.md: project status and Phase 0 deliverable index - docs/PROTOCOL_NOTES.md: byte-level wire-format reference (TBD) - docs/JDBC_NOTES.md: reverse-lookup index into the decompiled IBM JDBC driver (4.50.4.1), populated from build/jdbc-src/ once the decompile lands - docs/DECISION_LOG.md: running rationale, with the Phase-1 paramstyle /Python-floor/autocommit decisions pre-locked so they don't churn later CLAUDE.md is gitignored β€” operator-private context, public-PyPI repo. diff --git a/.gitignore b/.gitignore new file mode 100644 index 0000000..da144f1 --- /dev/null +++ b/.gitignore @@ -0,0 +1,56 @@ +# Project-private context (per global CLAUDE.md rule: only add to private repos) +CLAUDE.md + +# Python +__pycache__/ +*.py[cod] +*$py.class +*.so +.Python +build/ +dist/ +*.egg-info/ +.eggs/ +.installed.cfg +*.egg + +# Virtual environments +.venv/ +venv/ +env/ +.env +.env.* +!.env.example + +# uv +.python-version + +# IDE / editor +.vscode/ +.idea/ +*.swp +*.swo +*~ + +# Test / lint caches +.pytest_cache/ +.ruff_cache/ +.mypy_cache/ +.coverage +.coverage.* +htmlcov/ + +# OS +.DS_Store +Thumbs.db + +# Phase 0 build artifacts (decompiled JDBC, downloaded JARs) +# build/ already excluded by Python pattern above; we keep the directory for spike work +build/jdbc-src/ +build/*.jar + +# Wireshark captures live IN the repo intentionally β€” they're spike deliverables. +# Do NOT add docs/CAPTURES/ here. + +# Java reference client build outputs +*.class diff --git a/README.md b/README.md new file mode 100644 index 0000000..714d7ad --- /dev/null +++ b/README.md @@ -0,0 +1,26 @@ +# informix-db + +Pure-Python driver for IBM Informix IDS, speaking the SQLI wire protocol over raw sockets. **No IBM Client SDK. No JVM. No native libraries.** + +## Status + +🚧 **Phase 0 β€” Spike.** Characterizing the SQLI wire protocol. No library code yet. + +The protocol has never been published byte-for-byte by IBM. Every existing Informix driver in every language wraps either IBM's CSDK or the JDBC JAR. This project closes that gap. + +## Phase 0 deliverables + +- [`docs/PROTOCOL_NOTES.md`](docs/PROTOCOL_NOTES.md) β€” byte-level wire-format reference, derived from packet captures + JDBC decompilation +- [`docs/JDBC_NOTES.md`](docs/JDBC_NOTES.md) β€” index into the decompiled IBM JDBC driver's wire-protocol classes +- [`docs/DECISION_LOG.md`](docs/DECISION_LOG.md) β€” running rationale for protocol/auth/type decisions +- `docs/CAPTURES/*.pcap` β€” annotated packet captures of reference exchanges + +If Phase 0's exit criteria are met, library implementation begins in Phase 1. + +## Test target + +`icr.io/informix/informix-developer-database` (port 9088, native SQLI). See [`tests/docker-compose.yml`](tests/docker-compose.yml) once Phase 1 lands. + +## License + +MIT. diff --git a/docs/DECISION_LOG.md b/docs/DECISION_LOG.md new file mode 100644 index 0000000..05d4e9a --- /dev/null +++ b/docs/DECISION_LOG.md @@ -0,0 +1,108 @@ +# Decision Log + +Running rationale for protocol, auth, type, and architecture decisions made during the project. New decisions append; old ones are *amended* (with date) rather than overwritten. + +Format: every decision has a date, a status (`active` / `superseded` / `revisited`), the chosen path, the discarded alternatives, and the *why*. + +--- + +## 2026-05-02 β€” Project goal & off-ramp + +**Status**: active +**Decision**: Build a pure-Python implementation of the SQLI wire protocol. No IBM Client SDK. No JVM. No native libraries. +**Off-ramp** (chosen by user during planning): if Phase 0 reveals the protocol is intractable in pure Python β€” e.g., mandatory undocumented crypto in the handshake β€” narrow scope (lock to one server version, drop async, drop prepared statements if needed) and stay pure-Python. Do **not** fall back to JPype/JDBC; that defeats the project's purpose. +**Why**: The "no SDK / no JVM" goal is what makes this driver valuable. A JPype fallback would ship something that works but solves nothing the existing JDBC-via-JPype solution doesn't already solve. + +--- + +## 2026-05-02 β€” Package name + +**Status**: active +**Decision**: `informix-db` +**Discarded**: `informixdb-pure` (longer), `ifxsqli` (less discoverable), `pyifx` (obscure) +**PyPI availability**: confirmed available 2026-05-02 (HTTP 404 on `/pypi/informix-db/json`). The legacy `informixdb` is taken (HTTP 200), `informix` is also free (404) but too generic. +**Why**: Discoverability balanced with brevity. Anyone searching PyPI for "informix" finds it; the hyphen distinguishes it from the legacy C-extension wrapper. + +--- + +## 2026-05-02 β€” License + +**Status**: active +**Decision**: MIT +**Discarded**: Apache-2.0 (more defensive but less common in Python ecosystem), BSD-3-Clause +**Why**: Simplest, most permissive, ecosystem-standard for Python libraries. + +--- + +## 2026-05-02 β€” Sync first; async deferred + +**Status**: active +**Decision**: Build a sync, blocking-socket implementation. Async lands in Phase 6+ as a separate `informix_db.aio` subpackage following asyncpg's I/O-agnostic-protocol pattern. +**Why**: Wire protocols are hard enough; debugging protocol bugs through asyncio plumbing is two layers of indirection too many. Sync-first means we can test against blocking sockets, prove correctness, then mechanically swap the I/O layer. + +--- + +## 2026-05-02 β€” Test target + +**Status**: active +**Decision**: `icr.io/informix/informix-developer-database` (the IBM Informix Developer Edition image), port 9088 (native SQLI). +**Why**: Free, official, no license click-through, supports plain-password auth out of the box. Pinning the digest (not `:latest`) is a Phase 1 requirement. + +--- + +## 2026-05-02 β€” Phase 0 is a gate, not a step + +**Status**: active +**Decision**: No library code is written until `PROTOCOL_NOTES.md` meets all four exit criteria: +1. Login byte layout documented end-to-end +2. Message-type tags identified for login/execute/row/end-of-result/error/disconnect +3. `SELECT 1` round-trip fully labeled +4. JDBC source and packet capture corroborate on login + execute paths + +If exit criteria can't be met within bounded effort, invoke the off-ramp. + +**Why**: Most greenfield projects fail by writing code before they understand the problem. This project has an undocumented wire protocol as its central unknown. Gating on Phase 0 means a failed spike still produces a publicly valuable artifact (`PROTOCOL_NOTES.md`) instead of a half-built driver. + +--- + +## 2026-05-02 β€” Phase 1 architecture decisions (locked at start of Phase 1) + +> These are pre-decided so paramstyle/Python-floor/autocommit don't churn later. Recorded here so Phase 1 doesn't relitigate them. + +- **`paramstyle = "numeric"`** (`:1`, `:2`, …). Matches Informix ESQL/C convention. +- **Python β‰₯ 3.10**. Gives us `match`, modern type hints, `tomllib`. +- **`autocommit` defaults to off**. PEP 249 implicit semantics; opt-in via `connect(autocommit=True)`. +- **Author**: Ryan Malloy `` (per global pyproject.toml convention). +- **Versioning**: CalVer `YYYY.MM.DD` (`2026.05.02` initial); same-day fixes use PEP 440 post-release `2026.05.02.1`, `.2`, etc. + +--- + +## 2026-05-02 β€” DATE pulled forward to MVP + +**Status**: active +**Decision**: DATE is included in the Phase 2 MVP type set, alongside SMALLINT/INTEGER/BIGINT/FLOAT/CHAR/VARCHAR/BOOLEAN. +**Discarded**: leaving DATE in the "medium" / Phase 6 bucket. +**Why**: Almost no real Informix database is DATE-free. The encoding is trivial once the type code is known (4-byte day count from the Informix epoch 1899-12-31). Cheap to include; expensive to leave out. + +DATETIME / INTERVAL / DECIMAL / NUMERIC / MONEY remain in Phase 6+ β€” their encodings (qualifier-byte precision, BCD-style packed decimal) are non-trivial. + +--- + +## 2026-05-02 β€” `CLAUDE.md` excluded from git and sdist + +**Status**: active +**Decision**: `.gitignore` excludes `CLAUDE.md`. Once `pyproject.toml` exists, `[tool.hatch.build.targets.sdist].exclude` will also list `CLAUDE.md`. +**Why**: `CLAUDE.md` contains the user's email and operator-private context. Per global convention, only commit `CLAUDE.md` to private repos. This project is destined for PyPI / public Git. + +--- + +## (template β€” copy below this line for new entries) + +``` +## YYYY-MM-DD β€” + +**Status**: active | superseded | revisited +**Decision**: +**Discarded**: +**Why**: +``` diff --git a/docs/JDBC_NOTES.md b/docs/JDBC_NOTES.md new file mode 100644 index 0000000..682978e --- /dev/null +++ b/docs/JDBC_NOTES.md @@ -0,0 +1,66 @@ +# IBM JDBC Driver β€” Wire Protocol Class Index + +> **Phase 0 spike artifact.** Reverse-lookup index into the decompiled `com.ibm.informix:jdbc:4.50.4.1` JAR. This document tells us which Java class to read when we want to understand how the JDBC driver implements a given wire-protocol concern. +> +> **Legal note**: the decompiled source lives in `build/jdbc-src/` and is **not committed to this repository**. It is consulted as a clean-room understanding reference only. The Python implementation in `src/informix_db/` is written from `PROTOCOL_NOTES.md` (which cites observed packet bytes), not from the Java source. + +## Decompilation + +```bash +# Get the JAR +curl -O https://repo1.maven.org/maven2/com/ibm/informix/jdbc/4.50.4.1/jdbc-4.50.4.1.jar + +# Decompile (CFR β€” https://www.benf.org/other/cfr/) +java -jar cfr.jar jdbc-4.50.4.1.jar --outputdir build/jdbc-src/ +``` + +Driver version: `4.50.4.1` (latest as of 2026-05-02 on Maven Central). + +## Top-level package layout + +TBD β€” populate after decompilation. + +Expected (from research): +- `com.informix.jdbc` β€” driver entry, connection, statement, result-set +- Likely subpackages for protocol I/O, type system, error mapping + +## Class index (responsibility β†’ class) + +| Concern | Class | File path under `build/jdbc-src/` | Notes | +|---------|-------|------------------------------------|-------| +| Driver entry point | `com.informix.jdbc.IfxDriver` | TBD | implements `java.sql.Driver` | +| Connection | `com.informix.jdbc.IfxConnection` | TBD | extends `java.sql.Connection` | +| Wire socket I/O | TBD | TBD | look for `DataOutputStream` / `DataInputStream` users | +| Message framing | TBD | TBD | length-prefix + type-tag handlers | +| Login handshake | TBD | TBD | username/password/database selection | +| Auth method dispatch | TBD | TBD | plain / obfuscated / GSSAPI | +| Statement execute | TBD | TBD | EXECUTE / EXECUTE IMMEDIATE entry points | +| Prepared statement | TBD | TBD | parameter descriptors | +| Result-set parsing | TBD | TBD | column descriptors + row decoding | +| Type codecs (encoders) | TBD | TBD | `IfxTypeId` likely; per-type encoder methods | +| Type codecs (decoders) | TBD | TBD | per-type decoder methods | +| Error decoding (SQLSTATE) | TBD | TBD | error-message β†’ SQLException mapping | +| Disconnection | TBD | TBD | logout / socket close | +| Protocol trace | `com.informix.jdbc.*.getProtoTrace` | TBD | Useful debug hook; understand what it logs | + +## Method-level pointers + +> As we identify specific methods that map to specific wire bytes, record them here. Format: `Class#method() β†’ wire effect`. + +- _(none yet)_ + +## Things to grep for + +```bash +# Wire I/O entry points +grep -rln "DataOutputStream\|DataInputStream" build/jdbc-src/ + +# Type code constants +grep -rln "TYPEID\|IfxTypeId\|TypeId" build/jdbc-src/ + +# Auth method strings +grep -rln "OBFUSCATE\|PWDOBFUSCATION\|GSS\|KERBEROS" build/jdbc-src/ + +# SQLSTATE / error mapping +grep -rln "SQLSTATE\|SQLException" build/jdbc-src/ +``` diff --git a/docs/PROTOCOL_NOTES.md b/docs/PROTOCOL_NOTES.md new file mode 100644 index 0000000..9e4f36b --- /dev/null +++ b/docs/PROTOCOL_NOTES.md @@ -0,0 +1,163 @@ +# SQLI Wire Protocol Notes + +> **Phase 0 spike artifact.** This is the byte-level reference document for the Informix SQLI wire protocol, derived from a combination of packet captures against the IBM Informix Developer Edition Docker image and clean-room study of the decompiled IBM JDBC driver (`com.ibm.informix:jdbc:4.50.4.1`). It is the canonical reference that all subsequent implementation phases depend on. +> +> **Current state**: scaffold only. Sections fill in as the spike proceeds. + +--- + +## Source attribution conventions + +Each documented byte sequence cites both sources of evidence: + +- πŸ”΅ **PCAP**: observed in `docs/CAPTURES/.pcap` at offset `` +- 🟑 **JDBC**: cross-referenced against `.()` in the decompiled tree (see `JDBC_NOTES.md`) + +A finding is considered *confirmed* only when πŸ”΅ and 🟑 corroborate. Single-source observations are flagged 🟠 *unverified*. + +--- + +## 1. Connection establishment + +### TCP setup +- Port: 9088 (SQLI native, default) +- Protocol: TCP, no TLS in plain mode +- Who speaks first: TBD + +### Initial banner / capability exchange +TBD + +--- + +## 2. Login sequence + +### Message ordering +TBD + +### Login packet structure +| Offset | Width | Field | Notes | +|--------|-------|-------|-------| +| TBD | TBD | TBD | TBD | + +### Username encoding +TBD + +### Password encoding (plain auth, no obfuscation) +TBD + +### Database selection +TBD (during login, or separate USE-DATABASE message?) + +### Server response on success +TBD + +### Server response on auth failure +TBD + +--- + +## 3. Message framing + +### Header layout +TBD β€” fields: type tag, length, flags? + +### Length field +- Width: TBD bytes +- Endianness: TBD +- Value semantics: payload-only or whole-message? + +### Endianness (overall) +TBD + +### Message type tags +| Tag (hex) | Direction | Name | Purpose | +|-----------|-----------|------|---------| +| TBD | TBD | TBD | TBD | + +--- + +## 4. Statement execution: `SELECT 1` + +### Request +TBD + +### Response +TBD + +### Type code observed for the literal `1` +TBD + +--- + +## 5. Result-set framing + +### Column descriptor block +TBD β€” fields per column: name, type code, precision/scale, nullability flag, … + +### Row encoding +TBD β€” fixed-position fields? null bitmap? per-field length prefix? + +### End-of-result marker +TBD + +--- + +## 6. Error responses + +### Error packet format +TBD β€” fields: SQLSTATE, native error code, message text + +### Encoding +TBD + +--- + +## 7. Disconnection + +### Clientβ†’server logout message +TBD + +### Server-side close behavior +TBD + +--- + +## 8. Type codecs + +### IDS type codes observed in column descriptors + +| Code (decimal/hex) | IDS Type | Wire format | Notes | +|--------------------|----------|-------------|-------| +| TBD | SMALLINT | TBD | | +| TBD | INTEGER | TBD | | +| TBD | BIGINT | TBD | | +| TBD | FLOAT | TBD | | +| TBD | CHAR | TBD | | +| TBD | VARCHAR | TBD | | +| TBD | BOOLEAN | TBD | | +| TBD | DATE | TBD | 4-byte day count from 1899-12-31 (Informix epoch); confirm | + +(DATETIME, INTERVAL, DECIMAL, BLOBs etc. are out of scope for Phase 0; see `DECISION_LOG.md`.) + +--- + +## 9. Open questions + +> List things observed in JDBC source or packet captures that we don't yet understand. Each entry is either resolved-and-removed or escalated to `DECISION_LOG.md` as a deferred item. + +- _(none yet)_ + +--- + +## 10. Cross-checks + +### JDBC ↔ PCAP corroboration matrix + +| Phase 0 milestone | JDBC source confirms | PCAP confirms | Status | +|-------------------|----------------------|---------------|--------| +| Login byte layout | ⬜ | ⬜ | pending | +| `SELECT 1` round-trip | ⬜ | ⬜ | pending | +| Error response structure | ⬜ | ⬜ | pending | +| Disconnection | ⬜ | ⬜ | pending | + +Phase 0 exit requires all four rows = βœ…βœ… confirmed.