# SP-GiST Orbital Trie: Domain-Specific Index for Satellite Pass Prediction ## 1. Purpose & Lineage This document specifies a domain-specific SP-GiST index for accelerating satellite pass prediction queries in PostgreSQL. The index decomposes TLE orbital element space into a 2-level hierarchical trie — semi-major axis and inclination — with a query-time RAAN filter and propagation-aware cost estimation. **Primary use case:** 1-to-N pass prediction. "Which of 30,000 satellites are visible from this observer in the next 2 hours?" The index prunes the catalog analytically before any SGP4/SDP4 propagation, which is the dominant query cost. **Lineage:** This design evolved from the original KTrie spec ([KTRIE-SPEC-ORIGINAL.md](KTRIE-SPEC-ORIGINAL.md)) through design review. The original proposed a fully custom PostgreSQL index access method with 5 Keplerian element levels. Analysis showed: 1. Only 2 of 5 levels (SMA, inclination) are temporally stable enough to index 2. PostgreSQL's SP-GiST framework provides all necessary infrastructure (WAL, VACUUM, parallel scan, prefix compression) 3. RAAN filtering is better as a query-time analytical filter that adapts to the query window 4. Constellation-level pruning emerges naturally from the trie structure without explicit classification 5. The real innovation — propagation-aware cost estimation — works within SP-GiST via `traversalValue` The React visualization ([ktrie-layout.jsx](ktrie-layout.jsx)) reflects the original 5-level design and will be updated separately. --- ## 2. The Visibility Decision Tree Pass prediction asks: "Can satellite S be seen from observer O during time window [t₁, t₂]?" There are exactly four questions, ordered by decreasing temporal stability: ### Q1: Can the orbit reach observable altitude? Determined by perigee and apogee altitude, derived from SMA and eccentricity via Kepler's third law. - **Stability:** Very stable. Changes only with atmospheric drag (slow decay) or powered maneuvers (rare, discrete events). - **Discrimination:** Eliminates entire orbital regimes. An observer with a 10° minimum elevation can see satellites up to ~2,500 km altitude. This removes all MEO, GEO, and beyond — roughly 25% of the catalog. - **Verdict: INDEXABLE.** ### Q2: Can the ground track reach my latitude? A satellite with inclination `i` has a ground track bounded by latitudes `[-i, +i]`. An observer at latitude `φ` can only see satellites where `i ≥ |φ|` (simplified; the actual constraint includes off-track visibility angle, which depends on altitude). - **Stability:** Very stable. Inclination barely changes without thrust. - **Discrimination:** An observer at 43.7° (Eagle, Idaho) eliminates all equatorial and low-inclination objects — roughly 60% of LEO. - **Verdict: INDEXABLE.** ### Q3: Is the orbital plane aligned with my location? The Right Ascension of the Ascending Node (RAAN, Ω) determines where the orbit crosses the equator in inertial space. A satellite is potentially overhead when its RAAN is roughly aligned with the observer's current sidereal position. - **Stability: UNSTABLE.** RAAN precesses at 0.5–7°/day due to J2 oblateness perturbation. The precession rate is: ``` Ω̇ = -1.5 · n · J₂ · (Rₑ/a)² · cos(i) ``` where `n` is mean motion, `J₂ = 0.001082616`, `Rₑ = 6378.135 km` (WGS-72), and `a` is semi-major axis. - **Key insight:** Ω̇ is fully determined by SMA and inclination — the two elements already indexed at L0 and L1. The RAAN check requires only the TLE's stored RAAN₀ and epoch (available in the leaf) plus the elements already traversed. No additional index level is needed. - **Verdict: QUERY-TIME FILTER.** Computed per surviving candidate after L0+L1 pruning. ### Q4: Is the satellite at the right orbital phase? Mean anomaly changes at ~4°/second for LEO. The satellite's actual position along its orbit at any given moment is the irreducible question that requires SGP4/SDP4 numerical propagation. - **Stability: COMPLETELY UNSTABLE.** Changes continuously. - **Verdict: IRREDUCIBLE. Requires SGP4 propagation.** ### Conclusion Only two dimensions are worth indexing. Everything else is either computable from those two dimensions (RAAN) or requires full propagation (mean anomaly). This drives the 2-level trie design. --- ## 3. Why SP-GiST ### SP-GiST vs custom AM vs enhanced GiST | Requirement | SP-GiST | Custom AM | Enhanced GiST | |-------------|---------|-----------|----------------| | Page management + WAL | Free | Must implement | Free | | VACUUM + dead tuple cleanup | Free | Must implement | Free | | Crash recovery | Free | Must implement | Free | | Parallel scan | Free | Must implement | Free | | Prefix compression | Built-in (prefix/suffix) | Must implement | Not available | | Non-balanced tree | Native | Must implement | Not natural | | Fixed level decomposition | Via `level` counter | Must implement | Not available | | Traversal state for cost est. | `traversalValue` | Must implement | Not available | | Node labels for children | `nodeLabels` | Must implement | Not available | | Support functions needed | 5 | 12+ | 7 (already exist) | | Lines of domain-specific C | ~800 est. | ~3,000+ est. | ~200 (delta) | **SP-GiST was literally designed for this kind of data structure.** Its space-partitioning model matches the KTrie concept directly: fixed decomposition rules, non-balanced trees, prefix compression. The `traversalValue` mechanism carries state down the tree during scans — exactly what population-aware cost estimation needs. ### The quad-tree precedent PostgreSQL's built-in SP-GiST quad-tree (`spgquadtreeproc.c`) demonstrates a key pattern: the `restDatum` (remaining datum after prefix extraction) can be the **same type** at every level. The quad-tree passes the full point unchanged: ```c out->result.matchNode.restDatum = in->datum; ``` The tree terminates by depth, not by value exhaustion. Our design does the same — the leaf stores the full `tle` type. No intermediate compressed struct needed. The leaf has everything for the RAAN query-time filter (RAAN₀, epoch, mean_motion, inclination, eccentricity). ### What SP-GiST's prefix compression gives us The original KTrie spec implemented Patricia compression manually (compressed nodes, skip_bounds arrays, compression/decompression triggers). SP-GiST provides this natively: the `config` function declares a prefix type, and the `choose` function returns prefix/suffix decompositions. Single-child chains are compressed automatically by the framework. Same semantics, zero custom page management. --- ## 4. Architecture ### 2-level trie with query-time filters ``` SP-GiST L0: Semi-Major Axis (altitude regime) ├── Equal-population splits, not equal-range ├── Density-balanced: LEO (75% of catalog) gets finer bins └── Prunes by perigee/apogee altitude reachability SP-GiST L1: Inclination ├── Equal-population splits ├── Natural clustering at launch-site latitudes (28.5°, 51.6°, 97°) └── Prunes by ground-track latitude coverage Query-time RAAN filter (in leaf_consistent): ├── Projects RAAN to query midpoint via J2 precession rate ├── Checks alignment with observer's sidereal position ├── Adapts automatically to any query window length └── Cost: ~10ns per candidate, ~45μs for entire post-L0/L1 batch SGP4/SDP4 propagation (irreducible): └── Full numerical propagation for surviving candidates ``` ### Equal-population splits The current GiST uses equal-range splits (median midpoint of the geometric spread). Equal-population splits keep the tree balanced in **object density**, not geometric space. The altitude band 300–600 km (where Starlink, ISS, and most LEO debris live) gets finer subdivision than 600–2,000 km. This matters because: - Query cost is proportional to surviving population, not geometric volume - Dense regions need finer discrimination; sparse regions don't benefit from it - The cost estimator's population predictions are more accurate with balanced subtrees ### Constellation-aware pruning Mega-constellations cluster tightly in L0+L1 space. Starlink shell 1: all ~2,000 satellites at 550 km / 53.0°. OneWeb: ~600 at 1,200 km / 87.9°. These clusters naturally fall into single L1 subtrees, giving the cost estimator a tight population count and a narrow J2 precession rate range. Constellation-level pruning emerges from the trie structure without explicit constellation classification. --- ## 5. SP-GiST Support Functions Five functions implement the index. All use WGS-72 constants internally, consistent with pg_orrery's constant chain of custody. ### 5.1 `tle_trie_config()` Declares the type system for the trie. ```c Datum tle_trie_config(PG_FUNCTION_ARGS) { spgConfigOut *cfg = (spgConfigOut *) palloc0(sizeof(spgConfigOut)); cfg->prefixType = FLOAT8RANGEOID; /* orbital_bounds: SMA or inclination range */ cfg->labelType = FLOAT8OID; /* bin boundary value */ cfg->leafType = TLE_TYPE_OID; /* full TLE at leaves */ cfg->canReturnData = true; /* enable index-only scans */ cfg->longValuesOK = false; /* fixed-size types only */ PG_RETURN_VOID(); } ``` The prefix type stores the range covered by each inner node. The label type stores bin boundary values for child selection. The leaf type is the existing `tle` type — no new type needed at the leaf level. ### 5.2 `tle_trie_choose()` Decides which child to descend into during insertion. Level-aware: L0 routes by SMA, L1 routes by inclination. ```c Datum tle_trie_choose(PG_FUNCTION_ARGS) { spgChooseIn *in = (spgChooseIn *) PG_GETARG_POINTER(0); spgChooseOut *out = (spgChooseOut *) PG_GETARG_POINTER(1); tle_type *tle = DatumGetTleP(in->leafDatum); int level = in->level; double key; /* Extract the element for this level */ if (level == 0) key = tle_sma_km(tle); /* semi-major axis in km */ else key = tle_inclination_rad(tle); /* inclination in radians */ if (in->allTheSame) { /* All children equivalent — pick first */ out->resultType = spgMatchNode; out->result.matchNode.nodeN = 0; out->result.matchNode.levelAdd = 1; out->result.matchNode.restDatum = in->leafDatum; /* pass full TLE unchanged */ PG_RETURN_VOID(); } /* Find the child whose bin contains our key */ for (int i = 0; i < in->nNodes; i++) { double boundary = DatumGetFloat8(in->nodeLabels[i]); if (i == in->nNodes - 1 || key < DatumGetFloat8(in->nodeLabels[i + 1])) { out->resultType = spgMatchNode; out->result.matchNode.nodeN = i; out->result.matchNode.levelAdd = 1; out->result.matchNode.restDatum = in->leafDatum; PG_RETURN_VOID(); } } /* Fallback: add new node (should not happen with well-chosen splits) */ out->resultType = spgAddNode; out->result.addNode.nodeLabel = Float8GetDatum(key); out->result.addNode.nodeN = in->nNodes; PG_RETURN_VOID(); } ``` **Key detail:** `restDatum = in->leafDatum` — the full TLE passes unchanged to every level, following the quad-tree precedent. The trie terminates at depth 2 (two levels), not by value exhaustion. ### 5.3 `tle_trie_picksplit()` Splits a leaf page that has overflowed. Uses equal-population strategy: sort by the current level's element, split at the median entry (not the median value). ```c Datum tle_trie_picksplit(PG_FUNCTION_ARGS) { spgPickSplitIn *in = (spgPickSplitIn *) PG_GETARG_POINTER(0); spgPickSplitOut *out = (spgPickSplitOut *) PG_GETARG_POINTER(1); int level = in->level; int nTuples = in->nTuples; /* Extract sort keys for this level */ double *keys = palloc(nTuples * sizeof(double)); int *order = palloc(nTuples * sizeof(int)); for (int i = 0; i < nTuples; i++) { tle_type *tle = DatumGetTleP(in->datums[i]); keys[i] = (level == 0) ? tle_sma_km(tle) : tle_inclination_rad(tle); order[i] = i; } /* Sort by key value */ qsort_with_keys(order, nTuples, keys); /* stable sort by keys[order[i]] */ /* Equal-population split: divide into N bins of ~equal count */ int nBins = choose_bin_count(nTuples); /* heuristic: sqrt(n), clamped [2, 32] */ out->nNodes = nBins; out->nodeLabels = palloc(nBins * sizeof(Datum)); out->mapTuplesToNodes = palloc(nTuples * sizeof(int)); out->leafTupleDatums = palloc(nTuples * sizeof(Datum)); int per_bin = nTuples / nBins; for (int bin = 0; bin < nBins; bin++) { int start = bin * per_bin; out->nodeLabels[bin] = Float8GetDatum(keys[order[start]]); int end = (bin == nBins - 1) ? nTuples : (bin + 1) * per_bin; for (int j = start; j < end; j++) { int orig = order[j]; out->mapTuplesToNodes[orig] = bin; out->leafTupleDatums[orig] = in->datums[orig]; /* TLE unchanged */ } } pfree(keys); pfree(order); PG_RETURN_VOID(); } ``` ### 5.4 `tle_trie_inner_consistent()` The pruning engine. At each inner node, determines which children could contain matching satellites. Carries accumulated bounds in `traversalValue` for cost estimation. ```c /* * Traversal state carried down the tree during index scans. * Accumulates bounds for cost estimation and RAAN filter setup. */ typedef struct OrbitalTraversal { double sma_low, sma_high; /* accumulated SMA range from L0 */ double inc_low, inc_high; /* accumulated inclination range from L1 */ int32 population; /* estimated objects in this subtree */ double j2_rate; /* J2 precession rate (computed after L1) */ } OrbitalTraversal; Datum tle_trie_inner_consistent(PG_FUNCTION_ARGS) { spgInnerConsistentIn *in = (spgInnerConsistentIn *) PG_GETARG_POINTER(0); spgInnerConsistentOut *out = (spgInnerConsistentOut *) PG_GETARG_POINTER(1); int level = in->level; /* Reconstruct or initialize traversal state */ OrbitalTraversal *trav; if (in->traversalValue) trav = (OrbitalTraversal *) in->traversalValue; else { trav = palloc0(sizeof(OrbitalTraversal)); trav->sma_low = 0; trav->sma_high = INFINITY; trav->inc_low = 0; trav->inc_high = M_PI; trav->population = -1; /* unknown until leaf scan */ } /* Extract query parameters from scankey */ observer_window *qwin = extract_observer_window(in); out->nodeNumbers = palloc(in->nNodes * sizeof(int)); out->traversalValues = palloc(in->nNodes * sizeof(void *)); out->nNodes = 0; for (int i = 0; i < in->nNodes; i++) { double bin_low = DatumGetFloat8(in->nodeLabels[i]); double bin_high = (i < in->nNodes - 1) ? DatumGetFloat8(in->nodeLabels[i + 1]) : INFINITY; bool dominated = false; if (level == 0) { /* L0: SMA pruning — can a satellite at this altitude be visible? */ double perigee_alt_km = bin_low - WGS72_AE; if (perigee_alt_km > max_visible_altitude(qwin)) dominated = true; /* too high to be visible */ } else if (level == 1) { /* L1: Inclination pruning — can this inclination reach observer latitude? */ double observer_lat = fabs(qwin->observer.lat_rad); if (bin_high < observer_lat) dominated = true; /* ground track never reaches observer */ } if (!dominated) { /* Propagate traversal state to child */ OrbitalTraversal *child_trav = palloc(sizeof(OrbitalTraversal)); memcpy(child_trav, trav, sizeof(OrbitalTraversal)); if (level == 0) { child_trav->sma_low = bin_low; child_trav->sma_high = bin_high; } else if (level == 1) { child_trav->inc_low = bin_low; child_trav->inc_high = bin_high; /* After L1, we can compute J2 precession rate */ double a_mid = (child_trav->sma_low + child_trav->sma_high) / 2.0; double i_mid = (bin_low + bin_high) / 2.0; double n = sqrt(WGS72_MU / (a_mid * a_mid * a_mid)); child_trav->j2_rate = -1.5 * n * WGS72_J2 * (WGS72_AE / a_mid) * (WGS72_AE / a_mid) * cos(i_mid); } int idx = out->nNodes++; out->nodeNumbers[idx] = i; out->traversalValues[idx] = child_trav; } } PG_RETURN_VOID(); } ``` ### 5.5 `tle_trie_leaf_consistent()` Final check at the leaf level. Applies the RAAN query-time filter and eccentricity check. Always sets `recheck = true` because SGP4 propagation is still needed for the definitive answer. ```c Datum tle_trie_leaf_consistent(PG_FUNCTION_ARGS) { spgLeafConsistentIn *in = (spgLeafConsistentIn *) PG_GETARG_POINTER(0); spgLeafConsistentOut *out = (spgLeafConsistentOut *) PG_GETARG_POINTER(1); tle_type *tle = DatumGetTleP(in->leafDatum); observer_window *qwin = extract_observer_window_from_scankey(in); /* RAAN query-time filter */ double dt_days = (qwin->t_mid_jd - tle_epoch_jd(tle)); double raan_now = tle_raan_rad(tle) + tle_j2_raan_rate(tle) * dt_days; raan_now = fmod(raan_now, 2.0 * M_PI); if (raan_now < 0) raan_now += 2.0 * M_PI; /* Observer sidereal position at query midpoint */ double lst = local_sidereal_time(qwin->observer.lon_rad, qwin->t_mid_jd); /* RAAN visibility window: Earth rotation during query + ground footprint */ double earth_rotation = (qwin->t_end_jd - qwin->t_start_jd) * 360.0; /* degrees */ double footprint = ground_footprint_deg(tle_sma_km(tle), qwin->min_elevation); double raan_window_half = (earth_rotation / 2.0 + footprint) * (M_PI / 180.0); double raan_diff = fabs(raan_now - lst); if (raan_diff > M_PI) raan_diff = 2.0 * M_PI - raan_diff; if (raan_diff > raan_window_half) { out->leafValue = in->leafDatum; out->recheck = true; PG_RETURN_BOOL(false); /* RAAN not aligned — skip this candidate */ } /* Eccentricity sanity check: highly eccentric orbits need wider altitude band */ double ecc = tle_eccentricity(tle); if (ecc > 0.1) { double perigee = tle_sma_km(tle) * (1.0 - ecc) - WGS72_AE; double apogee = tle_sma_km(tle) * (1.0 + ecc) - WGS72_AE; if (perigee > max_visible_altitude(qwin)) { out->leafValue = in->leafDatum; out->recheck = true; PG_RETURN_BOOL(false); } } out->leafValue = in->leafDatum; out->recheck = true; /* always recheck — SGP4 propagation is the ground truth */ PG_RETURN_BOOL(true); } ``` --- ## 6. The RAAN Query-Time Filter ### Why not a trie level The effective RAAN visibility window depends entirely on the query time window. This table shows why static trie bins can't capture the physics: | Query window | Earth rotation | Effective RAAN window | Candidates eliminated | |-------------|---------------|----------------------|----------------------| | 30 min | 7.5° | ~52° (14% of sky) | ~85% | | 2 hours | 30° | ~74° (21%) | ~79% | | 6 hours | 90° | ~134° (37%) | ~63% | | 12 hours | 180° | ~224° (62%) | ~38% | | 24 hours | 360° | 360° (100%) | 0% | A static bin structure (4 bins of 90° each) could eliminate at most 75% (3 of 4 bins) for a 30-minute query, but eliminates nothing for queries longer than ~6 hours. The query-time filter automatically adapts to the actual window. ### The J2 precession rate RAAN precession is the dominant secular perturbation for LEO orbits. The rate is: ``` Ω̇ = -1.5 · n · J₂ · (Rₑ/a)² · cos(i) ``` where: - `n = √(μ/a³)` — mean motion (rad/s), with `μ = 398600.8 km³/s²` (WGS-72) - `J₂ = 0.001082616` (WGS-72) - `Rₑ = 6378.135 km` (WGS-72) - `a` — semi-major axis (km) - `i` — inclination (radians) For a 400 km LEO satellite at 51.6° inclination (ISS-like): ``` a = 6778.135 km n = 0.00114 rad/s Ω̇ = -1.5 × 0.00114 × 0.001082616 × (6378.135/6778.135)² × cos(51.6°) ≈ -1.07 × 10⁻⁶ rad/s ≈ -5.3°/day ``` This rate is fully determined by SMA and inclination — both already indexed at L0 and L1. ### Per-candidate cost Projecting RAAN to query time requires: 1. One subtraction (epoch difference) 2. One multiply (rate × time) 3. One addition (RAAN₀ + drift) 4. One modulo (wrap to [0, 2π)) 5. One range check (within visibility window?) **Cost: ~10 nanoseconds per candidate.** After L0 and L1 pruning, a typical pass prediction query over 30,000 TLEs leaves ~4,500 candidates. The total RAAN filter cost: ``` 4,500 × 10 ns = 45 μs ``` This is negligible compared to SGP4 propagation cost for the survivors (typically hundreds of milliseconds to seconds). The filter eliminates ~79% of those 4,500 candidates (for a 2-hour window), leaving ~945 for SGP4 propagation instead of 4,500. --- ## 7. Propagation-Aware Cost Estimation ### The core innovation Standard PostgreSQL index cost estimators model I/O cost: pages read, tuples fetched. KTrie's original insight — preserved in this SP-GiST design — is to model **downstream computation cost**. The expensive operation in satellite queries isn't reading data; it's SGP4/SDP4 propagation. The cost estimator tells the query planner: ``` estimated_cost = surviving_population × time_steps × sgp4_cost_per_step ``` This lets the planner compare "index scan + 945 SGP4 evaluations" vs "sequential scan + 30,000 SGP4 evaluations" — a decision no other PostgreSQL index type can make. ### Population tracking via `traversalValue` SP-GiST's `traversalValue` mechanism carries the `OrbitalTraversal` struct down the tree during scans. At each inner node, the cost estimator accumulates: ```c typedef struct OrbitalTraversal { double sma_low, sma_high; /* from L0 */ double inc_low, inc_high; /* from L1 */ int32 population; /* objects in this subtree */ double j2_rate; /* derived from L0 + L1 midpoints */ } OrbitalTraversal; ``` The `population` field counts objects in each subtree. After L1, `j2_rate` is computed from the accumulated SMA and inclination bounds. This enables the RAAN filter to predict how many candidates will survive: ``` expected_visible = population × (RAAN_window / 360°) ``` ### Constellation detection as emergent property Mega-constellations cluster tightly in SMA × inclination space. Starlink shell 1: ~2,000 satellites at 550 km / 53.0°. All members share nearly the same J2 precession rate, and their RAANs are evenly distributed by constellation design (phased orbital planes). The trie naturally groups these into a single L1 subtree. The cost estimator sees a tight population with a uniform RAAN distribution and can predict: ``` Starlink shell 1 example (2-hour query from Eagle, Idaho): population = 2,000 RAAN_window = 74° → fraction = 20.6% RAAN_survivors = 2,000 × 0.206 = ~412 time_steps = 120 (2 hours at 60-second intervals) sgp4_per_step = 0.05 ms estimated_cost = 412 × 120 × 0.05 ms = 2.5 seconds ``` No explicit constellation classification is needed. The structure emerges from orbital mechanics. ### Custom statistics function The cost estimator can be registered as a custom statistics function for the SP-GiST operator class, or integrated into the `inner_consistent` function via `traversalValue`. The planner sees accurate per-subtree cost predictions without any special configuration: ```c void spgist_tle_cost_estimate(PlannerInfo *root, IndexPath *path, double loop_count, Cost *startup_cost, Cost *total_cost, Selectivity *selectivity, double *correlation, double *index_pages) { double surviving_pop = estimate_from_traversal(root, path); double time_steps = extract_query_window_steps(path); double sgp4_cost = 0.05; /* ms per propagation step, calibratable */ double raan_fraction = estimate_raan_survival(path); double propagation_candidates = surviving_pop * raan_fraction; *total_cost = (2 * PAGE_READ_COST) /* L0 + L1 traversal */ + (surviving_pop * RAAN_FILTER_COST) /* query-time RAAN */ + (propagation_candidates * time_steps * sgp4_cost) /* SGP4 */ + (propagation_candidates * HEAP_FETCH_COST); /* fetch full TLE */ *selectivity = propagation_candidates / total_catalog_size; } ``` --- ## 8. Operators ### Types ```sql -- Observation query parameters bundled as a composite type CREATE TYPE observer_window AS ( obs observer, -- existing pg_orrery observer type (lat, lon, alt_m) t_start timestamptz, -- query window start t_end timestamptz, -- query window end min_el float8 -- minimum elevation angle in degrees ); ``` ### Operator definitions Three operators, starting minimal. The flagship operator `&?` is the primary interface for pass prediction. ```sql -- Strategy 1: Altitude regime containment -- "Is this TLE's orbit within this altitude range?" CREATE OPERATOR @> ( LEFTARG = float8range, -- altitude range in km (e.g., '[200,600]') RIGHTARG = tle, PROCEDURE = tle_regime_contains, COMMUTATOR = <@ ); -- Strategy 2: Orbital envelope overlap (enhanced existing &&) -- "Do these two TLEs share overlapping altitude + inclination space?" CREATE OPERATOR && ( LEFTARG = tle, RIGHTARG = tle, PROCEDURE = tle_envelope_overlaps, COMMUTATOR = && ); -- Strategy 3: Visibility cone check (flagship operator) -- "Could this satellite be visible from this observer during this time window?" -- Combines SMA pruning (L0) + inclination pruning (L1) + RAAN query-time filter CREATE OPERATOR &? ( LEFTARG = observer_window, RIGHTARG = tle, PROCEDURE = tle_visibility_possible ); ``` ### Example queries ```sql -- Primary use case: pass prediction -- "Which satellites might be visible from Eagle, Idaho in the next 2 hours?" SELECT s.norad_id, s.name FROM satellites s WHERE ROW( observer('43.7N 116.4W 760m'), now(), now() + interval '2 hours', 10.0 )::observer_window &? s.tle; -- Altitude regime query -- "Which satellites orbit between 400 and 600 km?" SELECT s.norad_id, s.name FROM satellites s WHERE '[400,600]'::float8range @> s.tle; -- Combined: visible LEO passes with full SGP4 propagation SELECT s.norad_id, s.name, predict_passes(s.tle, observer('43.7N 116.4W 760m'), now(), now() + interval '2 hours', 10.0) FROM satellites s WHERE ROW( observer('43.7N 116.4W 760m'), now(), now() + interval '2 hours', 10.0 )::observer_window &? s.tle; ``` The `&?` operator returns `true` for satellites that **might** be visible (a superset of the actual answer). The `predict_passes()` function then runs SGP4 propagation on only those candidates. The index's job is to minimize the candidate set, not to produce the final answer. --- ## 9. Data Structures ### SP-GiST node structure SP-GiST manages its own page layout, inner tuples, and leaf tuples. We declare our types; the framework handles storage: - **Prefix type:** `float8range` — the SMA or inclination range covered by an inner node - **Label type:** `float8` — bin boundary values for child dispatch - **Leaf type:** `tle` — the existing pg_orrery TLE type (112 bytes, `STORAGE = plain`) No new page format. No custom WAL records. SP-GiST provides all of this. ### `observer_window` composite type ```c /* Not a new C struct — uses SQL composite type mechanics */ /* Fields: observer (24B) + t_start (8B) + t_end (8B) + min_el (8B) = 48 bytes */ ``` This is a standard SQL composite type, not a custom C type. PostgreSQL handles I/O, storage, and parameter passing. The operator functions extract fields via `GetAttributeByNum()`. ### WGS-72 constants All internal computations use the same WGS-72 constants already defined in pg_orrery's `types.h`: ```c #define WGS72_MU 398600.8 /* km³/s² */ #define WGS72_AE 6378.135 /* km */ #define WGS72_J2 0.001082616 ``` The constant chain of custody is maintained: WGS-72 for orbital mechanics, WGS-84 for observer coordinate output. No new constants introduced. --- ## 10. What Changed from the Original KTrie | Original KTrie | Evolved SP-GiST | Rationale | |----------------|-----------------|-----------| | Custom index AM (12+ functions) | SP-GiST (5 functions) | PostgreSQL handles WAL, VACUUM, parallel, recovery | | 5-level trie (a, i, Ω, e, ω) | 2-level trie (a, i) | Only SMA + inclination are temporally stable | | RAAN as L2 (4-8 static bins) | Query-time analytical filter | Time-dependent; adapts to window; no reindex | | Eccentricity as L3 | Query-time check in `leaf_consistent` | Near-zero in LEO; unreliable for fine pruning | | Arg. perigee as L4 | Dropped | Near-zero discrimination in LEO | | Patricia compression (custom) | SP-GiST prefix compression (built-in) | Same semantics, zero custom page management | | `uint16` population (max 65,535) | `int32` via `traversalValue` | No overflow risk; Starlink alone targets 12,000+ | | 6 operators | 3 operators | Start minimal; `&?` is the flagship | | 10+ C source files | ~3 C source files | SP-GiST handles infrastructure | | Custom page layout (72B header) | SP-GiST page layout | No custom page management | | Custom split/merge logic | SP-GiST `picksplit` | Framework handles page operations | | Manual compression triggers | SP-GiST automatic prefix handling | Framework compresses single-child paths | | `reindex_raan_drift` parameter | Eliminated | RAAN not in the trie; no reindex needed | ### What survived intact Three ideas from the original spec are preserved without modification: 1. **Population-aware cost estimation** — the core innovation. Now delivered via `traversalValue` instead of `uint16` per-entry fields. 2. **Equal-population splits** — density-balanced tree, not geometry-balanced. Now in `picksplit` instead of custom split logic. 3. **WGS-72 constant chain of custody** — unchanged, inherited from pg_orrery. --- ## 11. Implementation Roadmap ### Phase 1: SP-GiST prototype Create the core SP-GiST index with the 5 support functions and 3 operators. **Files:** ``` src/spgist_tle.c — config, choose, picksplit, inner_consistent, leaf_consistent src/visibility_ops.c — &? operator, RAAN filter, observer_window type support ``` **SQL:** ``` sql/pg_orrery--0.6.0--0.7.0.sql — types, operators, SP-GiST operator class ``` **Tests:** ``` test/sql/spgist_tle.sql — index creation, operator tests, pruning validation test/expected/spgist_tle.out ``` **Goal:** Working index that correctly prunes by SMA + inclination + RAAN. Correctness over performance. ### Phase 2: Cost estimator Add the propagation-aware cost estimation function. **Files:** ``` src/spgist_cost.c — custom cost estimator with population tracking ``` **Goal:** Query planner correctly chooses index scan vs sequential scan based on predicted SGP4 cost. ### Phase 3: Benchmark vs current GiST Load 30,000 TLEs from Space-Track. Compare: | Metric | Current GiST | SP-GiST orbital trie | |--------|-------------|---------------------| | Pruning rate (2h window, 43.7° lat) | Measure | Measure | | Pruning rate (24h window, equator) | Measure | Measure | | Query time (pass prediction) | Measure | Measure | | False positive rate | Measure | Measure | | Index size | Measure | Measure | | Build time | Measure | Measure | ### Phase 4: Evaluate If the SP-GiST trie achieves >85% pruning on the benchmark suite, ship as pg_orrery v0.7.0. If not, analyze where candidates survive and determine whether deeper trie levels or alternative strategies are justified. **Decision gate:** The GiST enhancement (adding eccentricity to the existing key) provides the baseline. The SP-GiST trie must demonstrably exceed it to justify the additional code. --- ## 12. Open Questions 1. **Should `observer_window` be a new custom type or a SQL composite?** A custom C type (like `observer`) gives tighter control and avoids composite type overhead. A SQL composite is simpler to implement and extend. The composite is sufficient for the prototype; migrate to custom C type if profiling shows overhead. 2. **Can we store population metadata in SP-GiST's prefix datum?** The prefix is `float8range` (16 bytes). Population could be packed into a custom prefix struct instead, but this couples the prefix to the cost estimator. Using `traversalValue` keeps them separate. Needs testing to determine if `traversalValue` introduces measurable overhead. 3. **What's the right default picksplit strategy?** Options: median-of-element-value (equal-range) vs median-of-population (equal-count). Equal-count is theoretically better for balanced query cost, but equal-range is simpler and may perform similarly when the catalog distribution is stable. Implement equal-count, benchmark against equal-range. 4. **Should the cost estimator calibrate `sgp4_cost_per_step` at index creation time?** Running a small SGP4 benchmark during `CREATE INDEX` would give an accurate per-machine calibration. But it couples index creation to runtime performance, and the cost varies with TLE characteristics (near-earth vs deep-space). A configurable GUC (`pg_orrery.sgp4_cost_us`, default 50) may be more practical. 5. **What about conjunction screening (N×N)?** The original spec included `&=` (conjunction possible). This is a different query pattern: orbit-to-orbit, not observer-to-orbit. The SP-GiST trie can support it (L0+L1 pruning applies), but the RAAN filter doesn't (no observer). Defer to a future phase. 6. **Index-only scans: what data is needed?** `canReturnData = true` enables index-only scans, avoiding heap fetches. The leaf stores the full `tle` type. If the query only needs NORAD ID and basic orbital parameters (common for pass prediction pre-screening), the index can serve the query without touching the heap. Verify this works correctly with the `STORAGE = plain` TLE type.