pg_orrery/docs/ktrie/SP-GIST-ORBITAL-TRIE.md
Ryan Malloy bb235f51fa Add SP-GiST orbital trie design spec for satellite pass prediction index
Evolved from the original KTrie custom AM proposal (preserved as
KTRIE-SPEC-ORIGINAL.md). Key design decisions: 2-level trie (SMA +
inclination) instead of 5, SP-GiST framework instead of custom AM,
query-time RAAN filter instead of trie level, propagation-aware cost
estimation via traversalValue.
2026-02-17 19:53:42 -07:00

800 lines
34 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# SP-GiST Orbital Trie: Domain-Specific Index for Satellite Pass Prediction
## 1. Purpose & Lineage
This document specifies a domain-specific SP-GiST index for accelerating satellite pass prediction queries in PostgreSQL. The index decomposes TLE orbital element space into a 2-level hierarchical trie — semi-major axis and inclination — with a query-time RAAN filter and propagation-aware cost estimation.
**Primary use case:** 1-to-N pass prediction. "Which of 30,000 satellites are visible from this observer in the next 2 hours?" The index prunes the catalog analytically before any SGP4/SDP4 propagation, which is the dominant query cost.
**Lineage:** This design evolved from the original KTrie spec ([KTRIE-SPEC-ORIGINAL.md](KTRIE-SPEC-ORIGINAL.md)) through design review. The original proposed a fully custom PostgreSQL index access method with 5 Keplerian element levels. Analysis showed:
1. Only 2 of 5 levels (SMA, inclination) are temporally stable enough to index
2. PostgreSQL's SP-GiST framework provides all necessary infrastructure (WAL, VACUUM, parallel scan, prefix compression)
3. RAAN filtering is better as a query-time analytical filter that adapts to the query window
4. Constellation-level pruning emerges naturally from the trie structure without explicit classification
5. The real innovation — propagation-aware cost estimation — works within SP-GiST via `traversalValue`
The React visualization ([ktrie-layout.jsx](ktrie-layout.jsx)) reflects the original 5-level design and will be updated separately.
---
## 2. The Visibility Decision Tree
Pass prediction asks: "Can satellite S be seen from observer O during time window [t₁, t₂]?" There are exactly four questions, ordered by decreasing temporal stability:
### Q1: Can the orbit reach observable altitude?
Determined by perigee and apogee altitude, derived from SMA and eccentricity via Kepler's third law.
- **Stability:** Very stable. Changes only with atmospheric drag (slow decay) or powered maneuvers (rare, discrete events).
- **Discrimination:** Eliminates entire orbital regimes. An observer with a 10° minimum elevation can see satellites up to ~2,500 km altitude. This removes all MEO, GEO, and beyond — roughly 25% of the catalog.
- **Verdict: INDEXABLE.**
### Q2: Can the ground track reach my latitude?
A satellite with inclination `i` has a ground track bounded by latitudes `[-i, +i]`. An observer at latitude `φ` can only see satellites where `i ≥ |φ|` (simplified; the actual constraint includes off-track visibility angle, which depends on altitude).
- **Stability:** Very stable. Inclination barely changes without thrust.
- **Discrimination:** An observer at 43.7° (Eagle, Idaho) eliminates all equatorial and low-inclination objects — roughly 60% of LEO.
- **Verdict: INDEXABLE.**
### Q3: Is the orbital plane aligned with my location?
The Right Ascension of the Ascending Node (RAAN, Ω) determines where the orbit crosses the equator in inertial space. A satellite is potentially overhead when its RAAN is roughly aligned with the observer's current sidereal position.
- **Stability: UNSTABLE.** RAAN precesses at 0.57°/day due to J2 oblateness perturbation. The precession rate is:
```
Ω̇ = -1.5 · n · J₂ · (Rₑ/a)² · cos(i)
```
where `n` is mean motion, `J₂ = 0.001082616`, `Rₑ = 6378.135 km` (WGS-72), and `a` is semi-major axis.
- **Key insight:** Ω̇ is fully determined by SMA and inclination — the two elements already indexed at L0 and L1. The RAAN check requires only the TLE's stored RAAN₀ and epoch (available in the leaf) plus the elements already traversed. No additional index level is needed.
- **Verdict: QUERY-TIME FILTER.** Computed per surviving candidate after L0+L1 pruning.
### Q4: Is the satellite at the right orbital phase?
Mean anomaly changes at ~4°/second for LEO. The satellite's actual position along its orbit at any given moment is the irreducible question that requires SGP4/SDP4 numerical propagation.
- **Stability: COMPLETELY UNSTABLE.** Changes continuously.
- **Verdict: IRREDUCIBLE. Requires SGP4 propagation.**
### Conclusion
Only two dimensions are worth indexing. Everything else is either computable from those two dimensions (RAAN) or requires full propagation (mean anomaly). This drives the 2-level trie design.
---
## 3. Why SP-GiST
### SP-GiST vs custom AM vs enhanced GiST
| Requirement | SP-GiST | Custom AM | Enhanced GiST |
|-------------|---------|-----------|----------------|
| Page management + WAL | Free | Must implement | Free |
| VACUUM + dead tuple cleanup | Free | Must implement | Free |
| Crash recovery | Free | Must implement | Free |
| Parallel scan | Free | Must implement | Free |
| Prefix compression | Built-in (prefix/suffix) | Must implement | Not available |
| Non-balanced tree | Native | Must implement | Not natural |
| Fixed level decomposition | Via `level` counter | Must implement | Not available |
| Traversal state for cost est. | `traversalValue` | Must implement | Not available |
| Node labels for children | `nodeLabels` | Must implement | Not available |
| Support functions needed | 5 | 12+ | 7 (already exist) |
| Lines of domain-specific C | ~800 est. | ~3,000+ est. | ~200 (delta) |
**SP-GiST was literally designed for this kind of data structure.** Its space-partitioning model matches the KTrie concept directly: fixed decomposition rules, non-balanced trees, prefix compression. The `traversalValue` mechanism carries state down the tree during scans — exactly what population-aware cost estimation needs.
### The quad-tree precedent
PostgreSQL's built-in SP-GiST quad-tree (`spgquadtreeproc.c`) demonstrates a key pattern: the `restDatum` (remaining datum after prefix extraction) can be the **same type** at every level. The quad-tree passes the full point unchanged:
```c
out->result.matchNode.restDatum = in->datum;
```
The tree terminates by depth, not by value exhaustion. Our design does the same — the leaf stores the full `tle` type. No intermediate compressed struct needed. The leaf has everything for the RAAN query-time filter (RAAN₀, epoch, mean_motion, inclination, eccentricity).
### What SP-GiST's prefix compression gives us
The original KTrie spec implemented Patricia compression manually (compressed nodes, skip_bounds arrays, compression/decompression triggers). SP-GiST provides this natively: the `config` function declares a prefix type, and the `choose` function returns prefix/suffix decompositions. Single-child chains are compressed automatically by the framework. Same semantics, zero custom page management.
---
## 4. Architecture
### 2-level trie with query-time filters
```
SP-GiST L0: Semi-Major Axis (altitude regime)
├── Equal-population splits, not equal-range
├── Density-balanced: LEO (75% of catalog) gets finer bins
└── Prunes by perigee/apogee altitude reachability
SP-GiST L1: Inclination
├── Equal-population splits
├── Natural clustering at launch-site latitudes (28.5°, 51.6°, 97°)
└── Prunes by ground-track latitude coverage
Query-time RAAN filter (in leaf_consistent):
├── Projects RAAN to query midpoint via J2 precession rate
├── Checks alignment with observer's sidereal position
├── Adapts automatically to any query window length
└── Cost: ~10ns per candidate, ~45μs for entire post-L0/L1 batch
SGP4/SDP4 propagation (irreducible):
└── Full numerical propagation for surviving candidates
```
### Equal-population splits
The current GiST uses equal-range splits (median midpoint of the geometric spread). Equal-population splits keep the tree balanced in **object density**, not geometric space. The altitude band 300600 km (where Starlink, ISS, and most LEO debris live) gets finer subdivision than 6002,000 km. This matters because:
- Query cost is proportional to surviving population, not geometric volume
- Dense regions need finer discrimination; sparse regions don't benefit from it
- The cost estimator's population predictions are more accurate with balanced subtrees
### Constellation-aware pruning
Mega-constellations cluster tightly in L0+L1 space. Starlink shell 1: all ~2,000 satellites at 550 km / 53.0°. OneWeb: ~600 at 1,200 km / 87.9°. These clusters naturally fall into single L1 subtrees, giving the cost estimator a tight population count and a narrow J2 precession rate range. Constellation-level pruning emerges from the trie structure without explicit constellation classification.
---
## 5. SP-GiST Support Functions
Five functions implement the index. All use WGS-72 constants internally, consistent with pg_orrery's constant chain of custody.
### 5.1 `tle_trie_config()`
Declares the type system for the trie.
```c
Datum tle_trie_config(PG_FUNCTION_ARGS)
{
spgConfigOut *cfg = (spgConfigOut *) palloc0(sizeof(spgConfigOut));
cfg->prefixType = FLOAT8RANGEOID; /* orbital_bounds: SMA or inclination range */
cfg->labelType = FLOAT8OID; /* bin boundary value */
cfg->leafType = TLE_TYPE_OID; /* full TLE at leaves */
cfg->canReturnData = true; /* enable index-only scans */
cfg->longValuesOK = false; /* fixed-size types only */
PG_RETURN_VOID();
}
```
The prefix type stores the range covered by each inner node. The label type stores bin boundary values for child selection. The leaf type is the existing `tle` type — no new type needed at the leaf level.
### 5.2 `tle_trie_choose()`
Decides which child to descend into during insertion. Level-aware: L0 routes by SMA, L1 routes by inclination.
```c
Datum tle_trie_choose(PG_FUNCTION_ARGS)
{
spgChooseIn *in = (spgChooseIn *) PG_GETARG_POINTER(0);
spgChooseOut *out = (spgChooseOut *) PG_GETARG_POINTER(1);
tle_type *tle = DatumGetTleP(in->leafDatum);
int level = in->level;
double key;
/* Extract the element for this level */
if (level == 0)
key = tle_sma_km(tle); /* semi-major axis in km */
else
key = tle_inclination_rad(tle); /* inclination in radians */
if (in->allTheSame)
{
/* All children equivalent — pick first */
out->resultType = spgMatchNode;
out->result.matchNode.nodeN = 0;
out->result.matchNode.levelAdd = 1;
out->result.matchNode.restDatum = in->leafDatum; /* pass full TLE unchanged */
PG_RETURN_VOID();
}
/* Find the child whose bin contains our key */
for (int i = 0; i < in->nNodes; i++)
{
double boundary = DatumGetFloat8(in->nodeLabels[i]);
if (i == in->nNodes - 1 || key < DatumGetFloat8(in->nodeLabels[i + 1]))
{
out->resultType = spgMatchNode;
out->result.matchNode.nodeN = i;
out->result.matchNode.levelAdd = 1;
out->result.matchNode.restDatum = in->leafDatum;
PG_RETURN_VOID();
}
}
/* Fallback: add new node (should not happen with well-chosen splits) */
out->resultType = spgAddNode;
out->result.addNode.nodeLabel = Float8GetDatum(key);
out->result.addNode.nodeN = in->nNodes;
PG_RETURN_VOID();
}
```
**Key detail:** `restDatum = in->leafDatum` — the full TLE passes unchanged to every level, following the quad-tree precedent. The trie terminates at depth 2 (two levels), not by value exhaustion.
### 5.3 `tle_trie_picksplit()`
Splits a leaf page that has overflowed. Uses equal-population strategy: sort by the current level's element, split at the median entry (not the median value).
```c
Datum tle_trie_picksplit(PG_FUNCTION_ARGS)
{
spgPickSplitIn *in = (spgPickSplitIn *) PG_GETARG_POINTER(0);
spgPickSplitOut *out = (spgPickSplitOut *) PG_GETARG_POINTER(1);
int level = in->level;
int nTuples = in->nTuples;
/* Extract sort keys for this level */
double *keys = palloc(nTuples * sizeof(double));
int *order = palloc(nTuples * sizeof(int));
for (int i = 0; i < nTuples; i++)
{
tle_type *tle = DatumGetTleP(in->datums[i]);
keys[i] = (level == 0) ? tle_sma_km(tle) : tle_inclination_rad(tle);
order[i] = i;
}
/* Sort by key value */
qsort_with_keys(order, nTuples, keys); /* stable sort by keys[order[i]] */
/* Equal-population split: divide into N bins of ~equal count */
int nBins = choose_bin_count(nTuples); /* heuristic: sqrt(n), clamped [2, 32] */
out->nNodes = nBins;
out->nodeLabels = palloc(nBins * sizeof(Datum));
out->mapTuplesToNodes = palloc(nTuples * sizeof(int));
out->leafTupleDatums = palloc(nTuples * sizeof(Datum));
int per_bin = nTuples / nBins;
for (int bin = 0; bin < nBins; bin++)
{
int start = bin * per_bin;
out->nodeLabels[bin] = Float8GetDatum(keys[order[start]]);
int end = (bin == nBins - 1) ? nTuples : (bin + 1) * per_bin;
for (int j = start; j < end; j++)
{
int orig = order[j];
out->mapTuplesToNodes[orig] = bin;
out->leafTupleDatums[orig] = in->datums[orig]; /* TLE unchanged */
}
}
pfree(keys);
pfree(order);
PG_RETURN_VOID();
}
```
### 5.4 `tle_trie_inner_consistent()`
The pruning engine. At each inner node, determines which children could contain matching satellites. Carries accumulated bounds in `traversalValue` for cost estimation.
```c
/*
* Traversal state carried down the tree during index scans.
* Accumulates bounds for cost estimation and RAAN filter setup.
*/
typedef struct OrbitalTraversal {
double sma_low, sma_high; /* accumulated SMA range from L0 */
double inc_low, inc_high; /* accumulated inclination range from L1 */
int32 population; /* estimated objects in this subtree */
double j2_rate; /* J2 precession rate (computed after L1) */
} OrbitalTraversal;
Datum tle_trie_inner_consistent(PG_FUNCTION_ARGS)
{
spgInnerConsistentIn *in = (spgInnerConsistentIn *) PG_GETARG_POINTER(0);
spgInnerConsistentOut *out = (spgInnerConsistentOut *) PG_GETARG_POINTER(1);
int level = in->level;
/* Reconstruct or initialize traversal state */
OrbitalTraversal *trav;
if (in->traversalValue)
trav = (OrbitalTraversal *) in->traversalValue;
else
{
trav = palloc0(sizeof(OrbitalTraversal));
trav->sma_low = 0;
trav->sma_high = INFINITY;
trav->inc_low = 0;
trav->inc_high = M_PI;
trav->population = -1; /* unknown until leaf scan */
}
/* Extract query parameters from scankey */
observer_window *qwin = extract_observer_window(in);
out->nodeNumbers = palloc(in->nNodes * sizeof(int));
out->traversalValues = palloc(in->nNodes * sizeof(void *));
out->nNodes = 0;
for (int i = 0; i < in->nNodes; i++)
{
double bin_low = DatumGetFloat8(in->nodeLabels[i]);
double bin_high = (i < in->nNodes - 1)
? DatumGetFloat8(in->nodeLabels[i + 1])
: INFINITY;
bool dominated = false;
if (level == 0)
{
/* L0: SMA pruning — can a satellite at this altitude be visible? */
double perigee_alt_km = bin_low - WGS72_AE;
if (perigee_alt_km > max_visible_altitude(qwin))
dominated = true; /* too high to be visible */
}
else if (level == 1)
{
/* L1: Inclination pruning — can this inclination reach observer latitude? */
double observer_lat = fabs(qwin->observer.lat_rad);
if (bin_high < observer_lat)
dominated = true; /* ground track never reaches observer */
}
if (!dominated)
{
/* Propagate traversal state to child */
OrbitalTraversal *child_trav = palloc(sizeof(OrbitalTraversal));
memcpy(child_trav, trav, sizeof(OrbitalTraversal));
if (level == 0) {
child_trav->sma_low = bin_low;
child_trav->sma_high = bin_high;
} else if (level == 1) {
child_trav->inc_low = bin_low;
child_trav->inc_high = bin_high;
/* After L1, we can compute J2 precession rate */
double a_mid = (child_trav->sma_low + child_trav->sma_high) / 2.0;
double i_mid = (bin_low + bin_high) / 2.0;
double n = sqrt(WGS72_MU / (a_mid * a_mid * a_mid));
child_trav->j2_rate = -1.5 * n * WGS72_J2
* (WGS72_AE / a_mid) * (WGS72_AE / a_mid)
* cos(i_mid);
}
int idx = out->nNodes++;
out->nodeNumbers[idx] = i;
out->traversalValues[idx] = child_trav;
}
}
PG_RETURN_VOID();
}
```
### 5.5 `tle_trie_leaf_consistent()`
Final check at the leaf level. Applies the RAAN query-time filter and eccentricity check. Always sets `recheck = true` because SGP4 propagation is still needed for the definitive answer.
```c
Datum tle_trie_leaf_consistent(PG_FUNCTION_ARGS)
{
spgLeafConsistentIn *in = (spgLeafConsistentIn *) PG_GETARG_POINTER(0);
spgLeafConsistentOut *out = (spgLeafConsistentOut *) PG_GETARG_POINTER(1);
tle_type *tle = DatumGetTleP(in->leafDatum);
observer_window *qwin = extract_observer_window_from_scankey(in);
/* RAAN query-time filter */
double dt_days = (qwin->t_mid_jd - tle_epoch_jd(tle));
double raan_now = tle_raan_rad(tle) + tle_j2_raan_rate(tle) * dt_days;
raan_now = fmod(raan_now, 2.0 * M_PI);
if (raan_now < 0) raan_now += 2.0 * M_PI;
/* Observer sidereal position at query midpoint */
double lst = local_sidereal_time(qwin->observer.lon_rad, qwin->t_mid_jd);
/* RAAN visibility window: Earth rotation during query + ground footprint */
double earth_rotation = (qwin->t_end_jd - qwin->t_start_jd) * 360.0; /* degrees */
double footprint = ground_footprint_deg(tle_sma_km(tle), qwin->min_elevation);
double raan_window_half = (earth_rotation / 2.0 + footprint) * (M_PI / 180.0);
double raan_diff = fabs(raan_now - lst);
if (raan_diff > M_PI) raan_diff = 2.0 * M_PI - raan_diff;
if (raan_diff > raan_window_half)
{
out->leafValue = in->leafDatum;
out->recheck = true;
PG_RETURN_BOOL(false); /* RAAN not aligned — skip this candidate */
}
/* Eccentricity sanity check: highly eccentric orbits need wider altitude band */
double ecc = tle_eccentricity(tle);
if (ecc > 0.1)
{
double perigee = tle_sma_km(tle) * (1.0 - ecc) - WGS72_AE;
double apogee = tle_sma_km(tle) * (1.0 + ecc) - WGS72_AE;
if (perigee > max_visible_altitude(qwin))
{
out->leafValue = in->leafDatum;
out->recheck = true;
PG_RETURN_BOOL(false);
}
}
out->leafValue = in->leafDatum;
out->recheck = true; /* always recheck — SGP4 propagation is the ground truth */
PG_RETURN_BOOL(true);
}
```
---
## 6. The RAAN Query-Time Filter
### Why not a trie level
The effective RAAN visibility window depends entirely on the query time window. This table shows why static trie bins can't capture the physics:
| Query window | Earth rotation | Effective RAAN window | Candidates eliminated |
|-------------|---------------|----------------------|----------------------|
| 30 min | 7.5° | ~52° (14% of sky) | ~85% |
| 2 hours | 30° | ~74° (21%) | ~79% |
| 6 hours | 90° | ~134° (37%) | ~63% |
| 12 hours | 180° | ~224° (62%) | ~38% |
| 24 hours | 360° | 360° (100%) | 0% |
A static bin structure (4 bins of 90° each) could eliminate at most 75% (3 of 4 bins) for a 30-minute query, but eliminates nothing for queries longer than ~6 hours. The query-time filter automatically adapts to the actual window.
### The J2 precession rate
RAAN precession is the dominant secular perturbation for LEO orbits. The rate is:
```
Ω̇ = -1.5 · n · J₂ · (Rₑ/a)² · cos(i)
```
where:
- `n = √(μ/a³)` — mean motion (rad/s), with `μ = 398600.8 km³/s²` (WGS-72)
- `J₂ = 0.001082616` (WGS-72)
- `Rₑ = 6378.135 km` (WGS-72)
- `a` — semi-major axis (km)
- `i` — inclination (radians)
For a 400 km LEO satellite at 51.6° inclination (ISS-like):
```
a = 6778.135 km
n = 0.00114 rad/s
Ω̇ = -1.5 × 0.00114 × 0.001082616 × (6378.135/6778.135)² × cos(51.6°)
≈ -1.07 × 10⁻⁶ rad/s ≈ -5.3°/day
```
This rate is fully determined by SMA and inclination — both already indexed at L0 and L1.
### Per-candidate cost
Projecting RAAN to query time requires:
1. One subtraction (epoch difference)
2. One multiply (rate × time)
3. One addition (RAAN₀ + drift)
4. One modulo (wrap to [0, 2π))
5. One range check (within visibility window?)
**Cost: ~10 nanoseconds per candidate.**
After L0 and L1 pruning, a typical pass prediction query over 30,000 TLEs leaves ~4,500 candidates. The total RAAN filter cost:
```
4,500 × 10 ns = 45 μs
```
This is negligible compared to SGP4 propagation cost for the survivors (typically hundreds of milliseconds to seconds). The filter eliminates ~79% of those 4,500 candidates (for a 2-hour window), leaving ~945 for SGP4 propagation instead of 4,500.
---
## 7. Propagation-Aware Cost Estimation
### The core innovation
Standard PostgreSQL index cost estimators model I/O cost: pages read, tuples fetched. KTrie's original insight — preserved in this SP-GiST design — is to model **downstream computation cost**. The expensive operation in satellite queries isn't reading data; it's SGP4/SDP4 propagation.
The cost estimator tells the query planner:
```
estimated_cost = surviving_population × time_steps × sgp4_cost_per_step
```
This lets the planner compare "index scan + 945 SGP4 evaluations" vs "sequential scan + 30,000 SGP4 evaluations" — a decision no other PostgreSQL index type can make.
### Population tracking via `traversalValue`
SP-GiST's `traversalValue` mechanism carries the `OrbitalTraversal` struct down the tree during scans. At each inner node, the cost estimator accumulates:
```c
typedef struct OrbitalTraversal {
double sma_low, sma_high; /* from L0 */
double inc_low, inc_high; /* from L1 */
int32 population; /* objects in this subtree */
double j2_rate; /* derived from L0 + L1 midpoints */
} OrbitalTraversal;
```
The `population` field counts objects in each subtree. After L1, `j2_rate` is computed from the accumulated SMA and inclination bounds. This enables the RAAN filter to predict how many candidates will survive:
```
expected_visible = population × (RAAN_window / 360°)
```
### Constellation detection as emergent property
Mega-constellations cluster tightly in SMA × inclination space. Starlink shell 1: ~2,000 satellites at 550 km / 53.0°. All members share nearly the same J2 precession rate, and their RAANs are evenly distributed by constellation design (phased orbital planes).
The trie naturally groups these into a single L1 subtree. The cost estimator sees a tight population with a uniform RAAN distribution and can predict:
```
Starlink shell 1 example (2-hour query from Eagle, Idaho):
population = 2,000
RAAN_window = 74° → fraction = 20.6%
RAAN_survivors = 2,000 × 0.206 = ~412
time_steps = 120 (2 hours at 60-second intervals)
sgp4_per_step = 0.05 ms
estimated_cost = 412 × 120 × 0.05 ms = 2.5 seconds
```
No explicit constellation classification is needed. The structure emerges from orbital mechanics.
### Custom statistics function
The cost estimator can be registered as a custom statistics function for the SP-GiST operator class, or integrated into the `inner_consistent` function via `traversalValue`. The planner sees accurate per-subtree cost predictions without any special configuration:
```c
void spgist_tle_cost_estimate(PlannerInfo *root, IndexPath *path,
double loop_count, Cost *startup_cost,
Cost *total_cost, Selectivity *selectivity,
double *correlation, double *index_pages)
{
double surviving_pop = estimate_from_traversal(root, path);
double time_steps = extract_query_window_steps(path);
double sgp4_cost = 0.05; /* ms per propagation step, calibratable */
double raan_fraction = estimate_raan_survival(path);
double propagation_candidates = surviving_pop * raan_fraction;
*total_cost = (2 * PAGE_READ_COST) /* L0 + L1 traversal */
+ (surviving_pop * RAAN_FILTER_COST) /* query-time RAAN */
+ (propagation_candidates * time_steps * sgp4_cost) /* SGP4 */
+ (propagation_candidates * HEAP_FETCH_COST); /* fetch full TLE */
*selectivity = propagation_candidates / total_catalog_size;
}
```
---
## 8. Operators
### Types
```sql
-- Observation query parameters bundled as a composite type
CREATE TYPE observer_window AS (
obs observer, -- existing pg_orrery observer type (lat, lon, alt_m)
t_start timestamptz, -- query window start
t_end timestamptz, -- query window end
min_el float8 -- minimum elevation angle in degrees
);
```
### Operator definitions
Three operators, starting minimal. The flagship operator `&?` is the primary interface for pass prediction.
```sql
-- Strategy 1: Altitude regime containment
-- "Is this TLE's orbit within this altitude range?"
CREATE OPERATOR @> (
LEFTARG = float8range, -- altitude range in km (e.g., '[200,600]')
RIGHTARG = tle,
PROCEDURE = tle_regime_contains,
COMMUTATOR = <@
);
-- Strategy 2: Orbital envelope overlap (enhanced existing &&)
-- "Do these two TLEs share overlapping altitude + inclination space?"
CREATE OPERATOR && (
LEFTARG = tle,
RIGHTARG = tle,
PROCEDURE = tle_envelope_overlaps,
COMMUTATOR = &&
);
-- Strategy 3: Visibility cone check (flagship operator)
-- "Could this satellite be visible from this observer during this time window?"
-- Combines SMA pruning (L0) + inclination pruning (L1) + RAAN query-time filter
CREATE OPERATOR &? (
LEFTARG = observer_window,
RIGHTARG = tle,
PROCEDURE = tle_visibility_possible
);
```
### Example queries
```sql
-- Primary use case: pass prediction
-- "Which satellites might be visible from Eagle, Idaho in the next 2 hours?"
SELECT s.norad_id, s.name
FROM satellites s
WHERE ROW(
observer('43.7N 116.4W 760m'),
now(),
now() + interval '2 hours',
10.0
)::observer_window &? s.tle;
-- Altitude regime query
-- "Which satellites orbit between 400 and 600 km?"
SELECT s.norad_id, s.name
FROM satellites s
WHERE '[400,600]'::float8range @> s.tle;
-- Combined: visible LEO passes with full SGP4 propagation
SELECT s.norad_id, s.name,
predict_passes(s.tle, observer('43.7N 116.4W 760m'),
now(), now() + interval '2 hours', 10.0)
FROM satellites s
WHERE ROW(
observer('43.7N 116.4W 760m'),
now(),
now() + interval '2 hours',
10.0
)::observer_window &? s.tle;
```
The `&?` operator returns `true` for satellites that **might** be visible (a superset of the actual answer). The `predict_passes()` function then runs SGP4 propagation on only those candidates. The index's job is to minimize the candidate set, not to produce the final answer.
---
## 9. Data Structures
### SP-GiST node structure
SP-GiST manages its own page layout, inner tuples, and leaf tuples. We declare our types; the framework handles storage:
- **Prefix type:** `float8range` — the SMA or inclination range covered by an inner node
- **Label type:** `float8` — bin boundary values for child dispatch
- **Leaf type:** `tle` — the existing pg_orrery TLE type (112 bytes, `STORAGE = plain`)
No new page format. No custom WAL records. SP-GiST provides all of this.
### `observer_window` composite type
```c
/* Not a new C struct — uses SQL composite type mechanics */
/* Fields: observer (24B) + t_start (8B) + t_end (8B) + min_el (8B) = 48 bytes */
```
This is a standard SQL composite type, not a custom C type. PostgreSQL handles I/O, storage, and parameter passing. The operator functions extract fields via `GetAttributeByNum()`.
### WGS-72 constants
All internal computations use the same WGS-72 constants already defined in pg_orrery's `types.h`:
```c
#define WGS72_MU 398600.8 /* km³/s² */
#define WGS72_AE 6378.135 /* km */
#define WGS72_J2 0.001082616
```
The constant chain of custody is maintained: WGS-72 for orbital mechanics, WGS-84 for observer coordinate output. No new constants introduced.
---
## 10. What Changed from the Original KTrie
| Original KTrie | Evolved SP-GiST | Rationale |
|----------------|-----------------|-----------|
| Custom index AM (12+ functions) | SP-GiST (5 functions) | PostgreSQL handles WAL, VACUUM, parallel, recovery |
| 5-level trie (a, i, Ω, e, ω) | 2-level trie (a, i) | Only SMA + inclination are temporally stable |
| RAAN as L2 (4-8 static bins) | Query-time analytical filter | Time-dependent; adapts to window; no reindex |
| Eccentricity as L3 | Query-time check in `leaf_consistent` | Near-zero in LEO; unreliable for fine pruning |
| Arg. perigee as L4 | Dropped | Near-zero discrimination in LEO |
| Patricia compression (custom) | SP-GiST prefix compression (built-in) | Same semantics, zero custom page management |
| `uint16` population (max 65,535) | `int32` via `traversalValue` | No overflow risk; Starlink alone targets 12,000+ |
| 6 operators | 3 operators | Start minimal; `&?` is the flagship |
| 10+ C source files | ~3 C source files | SP-GiST handles infrastructure |
| Custom page layout (72B header) | SP-GiST page layout | No custom page management |
| Custom split/merge logic | SP-GiST `picksplit` | Framework handles page operations |
| Manual compression triggers | SP-GiST automatic prefix handling | Framework compresses single-child paths |
| `reindex_raan_drift` parameter | Eliminated | RAAN not in the trie; no reindex needed |
### What survived intact
Three ideas from the original spec are preserved without modification:
1. **Population-aware cost estimation** — the core innovation. Now delivered via `traversalValue` instead of `uint16` per-entry fields.
2. **Equal-population splits** — density-balanced tree, not geometry-balanced. Now in `picksplit` instead of custom split logic.
3. **WGS-72 constant chain of custody** — unchanged, inherited from pg_orrery.
---
## 11. Implementation Roadmap
### Phase 1: SP-GiST prototype
Create the core SP-GiST index with the 5 support functions and 3 operators.
**Files:**
```
src/spgist_tle.c — config, choose, picksplit, inner_consistent, leaf_consistent
src/visibility_ops.c — &? operator, RAAN filter, observer_window type support
```
**SQL:**
```
sql/pg_orrery--0.6.0--0.7.0.sql — types, operators, SP-GiST operator class
```
**Tests:**
```
test/sql/spgist_tle.sql — index creation, operator tests, pruning validation
test/expected/spgist_tle.out
```
**Goal:** Working index that correctly prunes by SMA + inclination + RAAN. Correctness over performance.
### Phase 2: Cost estimator
Add the propagation-aware cost estimation function.
**Files:**
```
src/spgist_cost.c — custom cost estimator with population tracking
```
**Goal:** Query planner correctly chooses index scan vs sequential scan based on predicted SGP4 cost.
### Phase 3: Benchmark vs current GiST
Load 30,000 TLEs from Space-Track. Compare:
| Metric | Current GiST | SP-GiST orbital trie |
|--------|-------------|---------------------|
| Pruning rate (2h window, 43.7° lat) | Measure | Measure |
| Pruning rate (24h window, equator) | Measure | Measure |
| Query time (pass prediction) | Measure | Measure |
| False positive rate | Measure | Measure |
| Index size | Measure | Measure |
| Build time | Measure | Measure |
### Phase 4: Evaluate
If the SP-GiST trie achieves >85% pruning on the benchmark suite, ship as pg_orrery v0.7.0. If not, analyze where candidates survive and determine whether deeper trie levels or alternative strategies are justified.
**Decision gate:** The GiST enhancement (adding eccentricity to the existing key) provides the baseline. The SP-GiST trie must demonstrably exceed it to justify the additional code.
---
## 12. Open Questions
1. **Should `observer_window` be a new custom type or a SQL composite?**
A custom C type (like `observer`) gives tighter control and avoids composite type overhead. A SQL composite is simpler to implement and extend. The composite is sufficient for the prototype; migrate to custom C type if profiling shows overhead.
2. **Can we store population metadata in SP-GiST's prefix datum?**
The prefix is `float8range` (16 bytes). Population could be packed into a custom prefix struct instead, but this couples the prefix to the cost estimator. Using `traversalValue` keeps them separate. Needs testing to determine if `traversalValue` introduces measurable overhead.
3. **What's the right default picksplit strategy?**
Options: median-of-element-value (equal-range) vs median-of-population (equal-count). Equal-count is theoretically better for balanced query cost, but equal-range is simpler and may perform similarly when the catalog distribution is stable. Implement equal-count, benchmark against equal-range.
4. **Should the cost estimator calibrate `sgp4_cost_per_step` at index creation time?**
Running a small SGP4 benchmark during `CREATE INDEX` would give an accurate per-machine calibration. But it couples index creation to runtime performance, and the cost varies with TLE characteristics (near-earth vs deep-space). A configurable GUC (`pg_orrery.sgp4_cost_us`, default 50) may be more practical.
5. **What about conjunction screening (N×N)?**
The original spec included `&=` (conjunction possible). This is a different query pattern: orbit-to-orbit, not observer-to-orbit. The SP-GiST trie can support it (L0+L1 pruning applies), but the RAAN filter doesn't (no observer). Defer to a future phase.
6. **Index-only scans: what data is needed?**
`canReturnData = true` enables index-only scans, avoiding heap fetches. The leaf stores the full `tle` type. If the query only needs NORAD ID and basic orbital parameters (common for pass prediction pre-screening), the index can serve the query without touching the heap. Verify this works correctly with the `STORAGE = plain` TLE type.