Initialize Phase 0 spike scaffold
Project goal: pure-Python implementation of the Informix SQLI wire protocol. No CSDK, no JVM, no native deps. Targets icr.io/informix /informix-developer-database (port 9088) as the dev/test instance. Phase 0 is a documentation-only spike that gates all implementation work. The four scaffolds: - README.md: project status and Phase 0 deliverable index - docs/PROTOCOL_NOTES.md: byte-level wire-format reference (TBD) - docs/JDBC_NOTES.md: reverse-lookup index into the decompiled IBM JDBC driver (4.50.4.1), populated from build/jdbc-src/ once the decompile lands - docs/DECISION_LOG.md: running rationale, with the Phase-1 paramstyle /Python-floor/autocommit decisions pre-locked so they don't churn later CLAUDE.md is gitignored — operator-private context, public-PyPI repo.
This commit is contained in:
commit
f202dbce0c
56
.gitignore
vendored
Normal file
56
.gitignore
vendored
Normal file
@ -0,0 +1,56 @@
|
||||
# Project-private context (per global CLAUDE.md rule: only add to private repos)
|
||||
CLAUDE.md
|
||||
|
||||
# Python
|
||||
__pycache__/
|
||||
*.py[cod]
|
||||
*$py.class
|
||||
*.so
|
||||
.Python
|
||||
build/
|
||||
dist/
|
||||
*.egg-info/
|
||||
.eggs/
|
||||
.installed.cfg
|
||||
*.egg
|
||||
|
||||
# Virtual environments
|
||||
.venv/
|
||||
venv/
|
||||
env/
|
||||
.env
|
||||
.env.*
|
||||
!.env.example
|
||||
|
||||
# uv
|
||||
.python-version
|
||||
|
||||
# IDE / editor
|
||||
.vscode/
|
||||
.idea/
|
||||
*.swp
|
||||
*.swo
|
||||
*~
|
||||
|
||||
# Test / lint caches
|
||||
.pytest_cache/
|
||||
.ruff_cache/
|
||||
.mypy_cache/
|
||||
.coverage
|
||||
.coverage.*
|
||||
htmlcov/
|
||||
|
||||
# OS
|
||||
.DS_Store
|
||||
Thumbs.db
|
||||
|
||||
# Phase 0 build artifacts (decompiled JDBC, downloaded JARs)
|
||||
# build/ already excluded by Python pattern above; we keep the directory for spike work
|
||||
build/jdbc-src/
|
||||
build/*.jar
|
||||
|
||||
# Wireshark captures live IN the repo intentionally — they're spike deliverables.
|
||||
# Do NOT add docs/CAPTURES/ here.
|
||||
|
||||
# Java reference client build outputs
|
||||
*.class
|
||||
26
README.md
Normal file
26
README.md
Normal file
@ -0,0 +1,26 @@
|
||||
# informix-db
|
||||
|
||||
Pure-Python driver for IBM Informix IDS, speaking the SQLI wire protocol over raw sockets. **No IBM Client SDK. No JVM. No native libraries.**
|
||||
|
||||
## Status
|
||||
|
||||
🚧 **Phase 0 — Spike.** Characterizing the SQLI wire protocol. No library code yet.
|
||||
|
||||
The protocol has never been published byte-for-byte by IBM. Every existing Informix driver in every language wraps either IBM's CSDK or the JDBC JAR. This project closes that gap.
|
||||
|
||||
## Phase 0 deliverables
|
||||
|
||||
- [`docs/PROTOCOL_NOTES.md`](docs/PROTOCOL_NOTES.md) — byte-level wire-format reference, derived from packet captures + JDBC decompilation
|
||||
- [`docs/JDBC_NOTES.md`](docs/JDBC_NOTES.md) — index into the decompiled IBM JDBC driver's wire-protocol classes
|
||||
- [`docs/DECISION_LOG.md`](docs/DECISION_LOG.md) — running rationale for protocol/auth/type decisions
|
||||
- `docs/CAPTURES/*.pcap` — annotated packet captures of reference exchanges
|
||||
|
||||
If Phase 0's exit criteria are met, library implementation begins in Phase 1.
|
||||
|
||||
## Test target
|
||||
|
||||
`icr.io/informix/informix-developer-database` (port 9088, native SQLI). See [`tests/docker-compose.yml`](tests/docker-compose.yml) once Phase 1 lands.
|
||||
|
||||
## License
|
||||
|
||||
MIT.
|
||||
108
docs/DECISION_LOG.md
Normal file
108
docs/DECISION_LOG.md
Normal file
@ -0,0 +1,108 @@
|
||||
# Decision Log
|
||||
|
||||
Running rationale for protocol, auth, type, and architecture decisions made during the project. New decisions append; old ones are *amended* (with date) rather than overwritten.
|
||||
|
||||
Format: every decision has a date, a status (`active` / `superseded` / `revisited`), the chosen path, the discarded alternatives, and the *why*.
|
||||
|
||||
---
|
||||
|
||||
## 2026-05-02 — Project goal & off-ramp
|
||||
|
||||
**Status**: active
|
||||
**Decision**: Build a pure-Python implementation of the SQLI wire protocol. No IBM Client SDK. No JVM. No native libraries.
|
||||
**Off-ramp** (chosen by user during planning): if Phase 0 reveals the protocol is intractable in pure Python — e.g., mandatory undocumented crypto in the handshake — narrow scope (lock to one server version, drop async, drop prepared statements if needed) and stay pure-Python. Do **not** fall back to JPype/JDBC; that defeats the project's purpose.
|
||||
**Why**: The "no SDK / no JVM" goal is what makes this driver valuable. A JPype fallback would ship something that works but solves nothing the existing JDBC-via-JPype solution doesn't already solve.
|
||||
|
||||
---
|
||||
|
||||
## 2026-05-02 — Package name
|
||||
|
||||
**Status**: active
|
||||
**Decision**: `informix-db`
|
||||
**Discarded**: `informixdb-pure` (longer), `ifxsqli` (less discoverable), `pyifx` (obscure)
|
||||
**PyPI availability**: confirmed available 2026-05-02 (HTTP 404 on `/pypi/informix-db/json`). The legacy `informixdb` is taken (HTTP 200), `informix` is also free (404) but too generic.
|
||||
**Why**: Discoverability balanced with brevity. Anyone searching PyPI for "informix" finds it; the hyphen distinguishes it from the legacy C-extension wrapper.
|
||||
|
||||
---
|
||||
|
||||
## 2026-05-02 — License
|
||||
|
||||
**Status**: active
|
||||
**Decision**: MIT
|
||||
**Discarded**: Apache-2.0 (more defensive but less common in Python ecosystem), BSD-3-Clause
|
||||
**Why**: Simplest, most permissive, ecosystem-standard for Python libraries.
|
||||
|
||||
---
|
||||
|
||||
## 2026-05-02 — Sync first; async deferred
|
||||
|
||||
**Status**: active
|
||||
**Decision**: Build a sync, blocking-socket implementation. Async lands in Phase 6+ as a separate `informix_db.aio` subpackage following asyncpg's I/O-agnostic-protocol pattern.
|
||||
**Why**: Wire protocols are hard enough; debugging protocol bugs through asyncio plumbing is two layers of indirection too many. Sync-first means we can test against blocking sockets, prove correctness, then mechanically swap the I/O layer.
|
||||
|
||||
---
|
||||
|
||||
## 2026-05-02 — Test target
|
||||
|
||||
**Status**: active
|
||||
**Decision**: `icr.io/informix/informix-developer-database` (the IBM Informix Developer Edition image), port 9088 (native SQLI).
|
||||
**Why**: Free, official, no license click-through, supports plain-password auth out of the box. Pinning the digest (not `:latest`) is a Phase 1 requirement.
|
||||
|
||||
---
|
||||
|
||||
## 2026-05-02 — Phase 0 is a gate, not a step
|
||||
|
||||
**Status**: active
|
||||
**Decision**: No library code is written until `PROTOCOL_NOTES.md` meets all four exit criteria:
|
||||
1. Login byte layout documented end-to-end
|
||||
2. Message-type tags identified for login/execute/row/end-of-result/error/disconnect
|
||||
3. `SELECT 1` round-trip fully labeled
|
||||
4. JDBC source and packet capture corroborate on login + execute paths
|
||||
|
||||
If exit criteria can't be met within bounded effort, invoke the off-ramp.
|
||||
|
||||
**Why**: Most greenfield projects fail by writing code before they understand the problem. This project has an undocumented wire protocol as its central unknown. Gating on Phase 0 means a failed spike still produces a publicly valuable artifact (`PROTOCOL_NOTES.md`) instead of a half-built driver.
|
||||
|
||||
---
|
||||
|
||||
## 2026-05-02 — Phase 1 architecture decisions (locked at start of Phase 1)
|
||||
|
||||
> These are pre-decided so paramstyle/Python-floor/autocommit don't churn later. Recorded here so Phase 1 doesn't relitigate them.
|
||||
|
||||
- **`paramstyle = "numeric"`** (`:1`, `:2`, …). Matches Informix ESQL/C convention.
|
||||
- **Python ≥ 3.10**. Gives us `match`, modern type hints, `tomllib`.
|
||||
- **`autocommit` defaults to off**. PEP 249 implicit semantics; opt-in via `connect(autocommit=True)`.
|
||||
- **Author**: Ryan Malloy `<ryan@supported.systems>` (per global pyproject.toml convention).
|
||||
- **Versioning**: CalVer `YYYY.MM.DD` (`2026.05.02` initial); same-day fixes use PEP 440 post-release `2026.05.02.1`, `.2`, etc.
|
||||
|
||||
---
|
||||
|
||||
## 2026-05-02 — DATE pulled forward to MVP
|
||||
|
||||
**Status**: active
|
||||
**Decision**: DATE is included in the Phase 2 MVP type set, alongside SMALLINT/INTEGER/BIGINT/FLOAT/CHAR/VARCHAR/BOOLEAN.
|
||||
**Discarded**: leaving DATE in the "medium" / Phase 6 bucket.
|
||||
**Why**: Almost no real Informix database is DATE-free. The encoding is trivial once the type code is known (4-byte day count from the Informix epoch 1899-12-31). Cheap to include; expensive to leave out.
|
||||
|
||||
DATETIME / INTERVAL / DECIMAL / NUMERIC / MONEY remain in Phase 6+ — their encodings (qualifier-byte precision, BCD-style packed decimal) are non-trivial.
|
||||
|
||||
---
|
||||
|
||||
## 2026-05-02 — `CLAUDE.md` excluded from git and sdist
|
||||
|
||||
**Status**: active
|
||||
**Decision**: `.gitignore` excludes `CLAUDE.md`. Once `pyproject.toml` exists, `[tool.hatch.build.targets.sdist].exclude` will also list `CLAUDE.md`.
|
||||
**Why**: `CLAUDE.md` contains the user's email and operator-private context. Per global convention, only commit `CLAUDE.md` to private repos. This project is destined for PyPI / public Git.
|
||||
|
||||
---
|
||||
|
||||
## (template — copy below this line for new entries)
|
||||
|
||||
```
|
||||
## YYYY-MM-DD — <one-line decision title>
|
||||
|
||||
**Status**: active | superseded | revisited
|
||||
**Decision**: <chosen path>
|
||||
**Discarded**: <alternatives, briefly>
|
||||
**Why**: <rationale>
|
||||
```
|
||||
66
docs/JDBC_NOTES.md
Normal file
66
docs/JDBC_NOTES.md
Normal file
@ -0,0 +1,66 @@
|
||||
# IBM JDBC Driver — Wire Protocol Class Index
|
||||
|
||||
> **Phase 0 spike artifact.** Reverse-lookup index into the decompiled `com.ibm.informix:jdbc:4.50.4.1` JAR. This document tells us which Java class to read when we want to understand how the JDBC driver implements a given wire-protocol concern.
|
||||
>
|
||||
> **Legal note**: the decompiled source lives in `build/jdbc-src/` and is **not committed to this repository**. It is consulted as a clean-room understanding reference only. The Python implementation in `src/informix_db/` is written from `PROTOCOL_NOTES.md` (which cites observed packet bytes), not from the Java source.
|
||||
|
||||
## Decompilation
|
||||
|
||||
```bash
|
||||
# Get the JAR
|
||||
curl -O https://repo1.maven.org/maven2/com/ibm/informix/jdbc/4.50.4.1/jdbc-4.50.4.1.jar
|
||||
|
||||
# Decompile (CFR — https://www.benf.org/other/cfr/)
|
||||
java -jar cfr.jar jdbc-4.50.4.1.jar --outputdir build/jdbc-src/
|
||||
```
|
||||
|
||||
Driver version: `4.50.4.1` (latest as of 2026-05-02 on Maven Central).
|
||||
|
||||
## Top-level package layout
|
||||
|
||||
TBD — populate after decompilation.
|
||||
|
||||
Expected (from research):
|
||||
- `com.informix.jdbc` — driver entry, connection, statement, result-set
|
||||
- Likely subpackages for protocol I/O, type system, error mapping
|
||||
|
||||
## Class index (responsibility → class)
|
||||
|
||||
| Concern | Class | File path under `build/jdbc-src/` | Notes |
|
||||
|---------|-------|------------------------------------|-------|
|
||||
| Driver entry point | `com.informix.jdbc.IfxDriver` | TBD | implements `java.sql.Driver` |
|
||||
| Connection | `com.informix.jdbc.IfxConnection` | TBD | extends `java.sql.Connection` |
|
||||
| Wire socket I/O | TBD | TBD | look for `DataOutputStream` / `DataInputStream` users |
|
||||
| Message framing | TBD | TBD | length-prefix + type-tag handlers |
|
||||
| Login handshake | TBD | TBD | username/password/database selection |
|
||||
| Auth method dispatch | TBD | TBD | plain / obfuscated / GSSAPI |
|
||||
| Statement execute | TBD | TBD | EXECUTE / EXECUTE IMMEDIATE entry points |
|
||||
| Prepared statement | TBD | TBD | parameter descriptors |
|
||||
| Result-set parsing | TBD | TBD | column descriptors + row decoding |
|
||||
| Type codecs (encoders) | TBD | TBD | `IfxTypeId` likely; per-type encoder methods |
|
||||
| Type codecs (decoders) | TBD | TBD | per-type decoder methods |
|
||||
| Error decoding (SQLSTATE) | TBD | TBD | error-message → SQLException mapping |
|
||||
| Disconnection | TBD | TBD | logout / socket close |
|
||||
| Protocol trace | `com.informix.jdbc.*.getProtoTrace` | TBD | Useful debug hook; understand what it logs |
|
||||
|
||||
## Method-level pointers
|
||||
|
||||
> As we identify specific methods that map to specific wire bytes, record them here. Format: `Class#method() → wire effect`.
|
||||
|
||||
- _(none yet)_
|
||||
|
||||
## Things to grep for
|
||||
|
||||
```bash
|
||||
# Wire I/O entry points
|
||||
grep -rln "DataOutputStream\|DataInputStream" build/jdbc-src/
|
||||
|
||||
# Type code constants
|
||||
grep -rln "TYPEID\|IfxTypeId\|TypeId" build/jdbc-src/
|
||||
|
||||
# Auth method strings
|
||||
grep -rln "OBFUSCATE\|PWDOBFUSCATION\|GSS\|KERBEROS" build/jdbc-src/
|
||||
|
||||
# SQLSTATE / error mapping
|
||||
grep -rln "SQLSTATE\|SQLException" build/jdbc-src/
|
||||
```
|
||||
163
docs/PROTOCOL_NOTES.md
Normal file
163
docs/PROTOCOL_NOTES.md
Normal file
@ -0,0 +1,163 @@
|
||||
# SQLI Wire Protocol Notes
|
||||
|
||||
> **Phase 0 spike artifact.** This is the byte-level reference document for the Informix SQLI wire protocol, derived from a combination of packet captures against the IBM Informix Developer Edition Docker image and clean-room study of the decompiled IBM JDBC driver (`com.ibm.informix:jdbc:4.50.4.1`). It is the canonical reference that all subsequent implementation phases depend on.
|
||||
>
|
||||
> **Current state**: scaffold only. Sections fill in as the spike proceeds.
|
||||
|
||||
---
|
||||
|
||||
## Source attribution conventions
|
||||
|
||||
Each documented byte sequence cites both sources of evidence:
|
||||
|
||||
- 🔵 **PCAP**: observed in `docs/CAPTURES/<file>.pcap` at offset `<n>`
|
||||
- 🟡 **JDBC**: cross-referenced against `<class>.<method>()` in the decompiled tree (see `JDBC_NOTES.md`)
|
||||
|
||||
A finding is considered *confirmed* only when 🔵 and 🟡 corroborate. Single-source observations are flagged 🟠 *unverified*.
|
||||
|
||||
---
|
||||
|
||||
## 1. Connection establishment
|
||||
|
||||
### TCP setup
|
||||
- Port: 9088 (SQLI native, default)
|
||||
- Protocol: TCP, no TLS in plain mode
|
||||
- Who speaks first: TBD
|
||||
|
||||
### Initial banner / capability exchange
|
||||
TBD
|
||||
|
||||
---
|
||||
|
||||
## 2. Login sequence
|
||||
|
||||
### Message ordering
|
||||
TBD
|
||||
|
||||
### Login packet structure
|
||||
| Offset | Width | Field | Notes |
|
||||
|--------|-------|-------|-------|
|
||||
| TBD | TBD | TBD | TBD |
|
||||
|
||||
### Username encoding
|
||||
TBD
|
||||
|
||||
### Password encoding (plain auth, no obfuscation)
|
||||
TBD
|
||||
|
||||
### Database selection
|
||||
TBD (during login, or separate USE-DATABASE message?)
|
||||
|
||||
### Server response on success
|
||||
TBD
|
||||
|
||||
### Server response on auth failure
|
||||
TBD
|
||||
|
||||
---
|
||||
|
||||
## 3. Message framing
|
||||
|
||||
### Header layout
|
||||
TBD — fields: type tag, length, flags?
|
||||
|
||||
### Length field
|
||||
- Width: TBD bytes
|
||||
- Endianness: TBD
|
||||
- Value semantics: payload-only or whole-message?
|
||||
|
||||
### Endianness (overall)
|
||||
TBD
|
||||
|
||||
### Message type tags
|
||||
| Tag (hex) | Direction | Name | Purpose |
|
||||
|-----------|-----------|------|---------|
|
||||
| TBD | TBD | TBD | TBD |
|
||||
|
||||
---
|
||||
|
||||
## 4. Statement execution: `SELECT 1`
|
||||
|
||||
### Request
|
||||
TBD
|
||||
|
||||
### Response
|
||||
TBD
|
||||
|
||||
### Type code observed for the literal `1`
|
||||
TBD
|
||||
|
||||
---
|
||||
|
||||
## 5. Result-set framing
|
||||
|
||||
### Column descriptor block
|
||||
TBD — fields per column: name, type code, precision/scale, nullability flag, …
|
||||
|
||||
### Row encoding
|
||||
TBD — fixed-position fields? null bitmap? per-field length prefix?
|
||||
|
||||
### End-of-result marker
|
||||
TBD
|
||||
|
||||
---
|
||||
|
||||
## 6. Error responses
|
||||
|
||||
### Error packet format
|
||||
TBD — fields: SQLSTATE, native error code, message text
|
||||
|
||||
### Encoding
|
||||
TBD
|
||||
|
||||
---
|
||||
|
||||
## 7. Disconnection
|
||||
|
||||
### Client→server logout message
|
||||
TBD
|
||||
|
||||
### Server-side close behavior
|
||||
TBD
|
||||
|
||||
---
|
||||
|
||||
## 8. Type codecs
|
||||
|
||||
### IDS type codes observed in column descriptors
|
||||
|
||||
| Code (decimal/hex) | IDS Type | Wire format | Notes |
|
||||
|--------------------|----------|-------------|-------|
|
||||
| TBD | SMALLINT | TBD | |
|
||||
| TBD | INTEGER | TBD | |
|
||||
| TBD | BIGINT | TBD | |
|
||||
| TBD | FLOAT | TBD | |
|
||||
| TBD | CHAR | TBD | |
|
||||
| TBD | VARCHAR | TBD | |
|
||||
| TBD | BOOLEAN | TBD | |
|
||||
| TBD | DATE | TBD | 4-byte day count from 1899-12-31 (Informix epoch); confirm |
|
||||
|
||||
(DATETIME, INTERVAL, DECIMAL, BLOBs etc. are out of scope for Phase 0; see `DECISION_LOG.md`.)
|
||||
|
||||
---
|
||||
|
||||
## 9. Open questions
|
||||
|
||||
> List things observed in JDBC source or packet captures that we don't yet understand. Each entry is either resolved-and-removed or escalated to `DECISION_LOG.md` as a deferred item.
|
||||
|
||||
- _(none yet)_
|
||||
|
||||
---
|
||||
|
||||
## 10. Cross-checks
|
||||
|
||||
### JDBC ↔ PCAP corroboration matrix
|
||||
|
||||
| Phase 0 milestone | JDBC source confirms | PCAP confirms | Status |
|
||||
|-------------------|----------------------|---------------|--------|
|
||||
| Login byte layout | ⬜ | ⬜ | pending |
|
||||
| `SELECT 1` round-trip | ⬜ | ⬜ | pending |
|
||||
| Error response structure | ⬜ | ⬜ | pending |
|
||||
| Disconnection | ⬜ | ⬜ | pending |
|
||||
|
||||
Phase 0 exit requires all four rows = ✅✅ confirmed.
|
||||
Loading…
x
Reference in New Issue
Block a user