Evolved from the original KTrie custom AM proposal (preserved as KTRIE-SPEC-ORIGINAL.md). Key design decisions: 2-level trie (SMA + inclination) instead of 5, SP-GiST framework instead of custom AM, query-time RAAN filter instead of trie level, propagation-aware cost estimation via traversalValue.
800 lines
34 KiB
Markdown
800 lines
34 KiB
Markdown
# SP-GiST Orbital Trie: Domain-Specific Index for Satellite Pass Prediction
|
||
|
||
## 1. Purpose & Lineage
|
||
|
||
This document specifies a domain-specific SP-GiST index for accelerating satellite pass prediction queries in PostgreSQL. The index decomposes TLE orbital element space into a 2-level hierarchical trie — semi-major axis and inclination — with a query-time RAAN filter and propagation-aware cost estimation.
|
||
|
||
**Primary use case:** 1-to-N pass prediction. "Which of 30,000 satellites are visible from this observer in the next 2 hours?" The index prunes the catalog analytically before any SGP4/SDP4 propagation, which is the dominant query cost.
|
||
|
||
**Lineage:** This design evolved from the original KTrie spec ([KTRIE-SPEC-ORIGINAL.md](KTRIE-SPEC-ORIGINAL.md)) through design review. The original proposed a fully custom PostgreSQL index access method with 5 Keplerian element levels. Analysis showed:
|
||
|
||
1. Only 2 of 5 levels (SMA, inclination) are temporally stable enough to index
|
||
2. PostgreSQL's SP-GiST framework provides all necessary infrastructure (WAL, VACUUM, parallel scan, prefix compression)
|
||
3. RAAN filtering is better as a query-time analytical filter that adapts to the query window
|
||
4. Constellation-level pruning emerges naturally from the trie structure without explicit classification
|
||
5. The real innovation — propagation-aware cost estimation — works within SP-GiST via `traversalValue`
|
||
|
||
The React visualization ([ktrie-layout.jsx](ktrie-layout.jsx)) reflects the original 5-level design and will be updated separately.
|
||
|
||
---
|
||
|
||
## 2. The Visibility Decision Tree
|
||
|
||
Pass prediction asks: "Can satellite S be seen from observer O during time window [t₁, t₂]?" There are exactly four questions, ordered by decreasing temporal stability:
|
||
|
||
### Q1: Can the orbit reach observable altitude?
|
||
|
||
Determined by perigee and apogee altitude, derived from SMA and eccentricity via Kepler's third law.
|
||
|
||
- **Stability:** Very stable. Changes only with atmospheric drag (slow decay) or powered maneuvers (rare, discrete events).
|
||
- **Discrimination:** Eliminates entire orbital regimes. An observer with a 10° minimum elevation can see satellites up to ~2,500 km altitude. This removes all MEO, GEO, and beyond — roughly 25% of the catalog.
|
||
- **Verdict: INDEXABLE.**
|
||
|
||
### Q2: Can the ground track reach my latitude?
|
||
|
||
A satellite with inclination `i` has a ground track bounded by latitudes `[-i, +i]`. An observer at latitude `φ` can only see satellites where `i ≥ |φ|` (simplified; the actual constraint includes off-track visibility angle, which depends on altitude).
|
||
|
||
- **Stability:** Very stable. Inclination barely changes without thrust.
|
||
- **Discrimination:** An observer at 43.7° (Eagle, Idaho) eliminates all equatorial and low-inclination objects — roughly 60% of LEO.
|
||
- **Verdict: INDEXABLE.**
|
||
|
||
### Q3: Is the orbital plane aligned with my location?
|
||
|
||
The Right Ascension of the Ascending Node (RAAN, Ω) determines where the orbit crosses the equator in inertial space. A satellite is potentially overhead when its RAAN is roughly aligned with the observer's current sidereal position.
|
||
|
||
- **Stability: UNSTABLE.** RAAN precesses at 0.5–7°/day due to J2 oblateness perturbation. The precession rate is:
|
||
|
||
```
|
||
Ω̇ = -1.5 · n · J₂ · (Rₑ/a)² · cos(i)
|
||
```
|
||
|
||
where `n` is mean motion, `J₂ = 0.001082616`, `Rₑ = 6378.135 km` (WGS-72), and `a` is semi-major axis.
|
||
|
||
- **Key insight:** Ω̇ is fully determined by SMA and inclination — the two elements already indexed at L0 and L1. The RAAN check requires only the TLE's stored RAAN₀ and epoch (available in the leaf) plus the elements already traversed. No additional index level is needed.
|
||
- **Verdict: QUERY-TIME FILTER.** Computed per surviving candidate after L0+L1 pruning.
|
||
|
||
### Q4: Is the satellite at the right orbital phase?
|
||
|
||
Mean anomaly changes at ~4°/second for LEO. The satellite's actual position along its orbit at any given moment is the irreducible question that requires SGP4/SDP4 numerical propagation.
|
||
|
||
- **Stability: COMPLETELY UNSTABLE.** Changes continuously.
|
||
- **Verdict: IRREDUCIBLE. Requires SGP4 propagation.**
|
||
|
||
### Conclusion
|
||
|
||
Only two dimensions are worth indexing. Everything else is either computable from those two dimensions (RAAN) or requires full propagation (mean anomaly). This drives the 2-level trie design.
|
||
|
||
---
|
||
|
||
## 3. Why SP-GiST
|
||
|
||
### SP-GiST vs custom AM vs enhanced GiST
|
||
|
||
| Requirement | SP-GiST | Custom AM | Enhanced GiST |
|
||
|-------------|---------|-----------|----------------|
|
||
| Page management + WAL | Free | Must implement | Free |
|
||
| VACUUM + dead tuple cleanup | Free | Must implement | Free |
|
||
| Crash recovery | Free | Must implement | Free |
|
||
| Parallel scan | Free | Must implement | Free |
|
||
| Prefix compression | Built-in (prefix/suffix) | Must implement | Not available |
|
||
| Non-balanced tree | Native | Must implement | Not natural |
|
||
| Fixed level decomposition | Via `level` counter | Must implement | Not available |
|
||
| Traversal state for cost est. | `traversalValue` | Must implement | Not available |
|
||
| Node labels for children | `nodeLabels` | Must implement | Not available |
|
||
| Support functions needed | 5 | 12+ | 7 (already exist) |
|
||
| Lines of domain-specific C | ~800 est. | ~3,000+ est. | ~200 (delta) |
|
||
|
||
**SP-GiST was literally designed for this kind of data structure.** Its space-partitioning model matches the KTrie concept directly: fixed decomposition rules, non-balanced trees, prefix compression. The `traversalValue` mechanism carries state down the tree during scans — exactly what population-aware cost estimation needs.
|
||
|
||
### The quad-tree precedent
|
||
|
||
PostgreSQL's built-in SP-GiST quad-tree (`spgquadtreeproc.c`) demonstrates a key pattern: the `restDatum` (remaining datum after prefix extraction) can be the **same type** at every level. The quad-tree passes the full point unchanged:
|
||
|
||
```c
|
||
out->result.matchNode.restDatum = in->datum;
|
||
```
|
||
|
||
The tree terminates by depth, not by value exhaustion. Our design does the same — the leaf stores the full `tle` type. No intermediate compressed struct needed. The leaf has everything for the RAAN query-time filter (RAAN₀, epoch, mean_motion, inclination, eccentricity).
|
||
|
||
### What SP-GiST's prefix compression gives us
|
||
|
||
The original KTrie spec implemented Patricia compression manually (compressed nodes, skip_bounds arrays, compression/decompression triggers). SP-GiST provides this natively: the `config` function declares a prefix type, and the `choose` function returns prefix/suffix decompositions. Single-child chains are compressed automatically by the framework. Same semantics, zero custom page management.
|
||
|
||
---
|
||
|
||
## 4. Architecture
|
||
|
||
### 2-level trie with query-time filters
|
||
|
||
```
|
||
SP-GiST L0: Semi-Major Axis (altitude regime)
|
||
├── Equal-population splits, not equal-range
|
||
├── Density-balanced: LEO (75% of catalog) gets finer bins
|
||
└── Prunes by perigee/apogee altitude reachability
|
||
|
||
SP-GiST L1: Inclination
|
||
├── Equal-population splits
|
||
├── Natural clustering at launch-site latitudes (28.5°, 51.6°, 97°)
|
||
└── Prunes by ground-track latitude coverage
|
||
|
||
Query-time RAAN filter (in leaf_consistent):
|
||
├── Projects RAAN to query midpoint via J2 precession rate
|
||
├── Checks alignment with observer's sidereal position
|
||
├── Adapts automatically to any query window length
|
||
└── Cost: ~10ns per candidate, ~45μs for entire post-L0/L1 batch
|
||
|
||
SGP4/SDP4 propagation (irreducible):
|
||
└── Full numerical propagation for surviving candidates
|
||
```
|
||
|
||
### Equal-population splits
|
||
|
||
The current GiST uses equal-range splits (median midpoint of the geometric spread). Equal-population splits keep the tree balanced in **object density**, not geometric space. The altitude band 300–600 km (where Starlink, ISS, and most LEO debris live) gets finer subdivision than 600–2,000 km. This matters because:
|
||
|
||
- Query cost is proportional to surviving population, not geometric volume
|
||
- Dense regions need finer discrimination; sparse regions don't benefit from it
|
||
- The cost estimator's population predictions are more accurate with balanced subtrees
|
||
|
||
### Constellation-aware pruning
|
||
|
||
Mega-constellations cluster tightly in L0+L1 space. Starlink shell 1: all ~2,000 satellites at 550 km / 53.0°. OneWeb: ~600 at 1,200 km / 87.9°. These clusters naturally fall into single L1 subtrees, giving the cost estimator a tight population count and a narrow J2 precession rate range. Constellation-level pruning emerges from the trie structure without explicit constellation classification.
|
||
|
||
---
|
||
|
||
## 5. SP-GiST Support Functions
|
||
|
||
Five functions implement the index. All use WGS-72 constants internally, consistent with pg_orrery's constant chain of custody.
|
||
|
||
### 5.1 `tle_trie_config()`
|
||
|
||
Declares the type system for the trie.
|
||
|
||
```c
|
||
Datum tle_trie_config(PG_FUNCTION_ARGS)
|
||
{
|
||
spgConfigOut *cfg = (spgConfigOut *) palloc0(sizeof(spgConfigOut));
|
||
|
||
cfg->prefixType = FLOAT8RANGEOID; /* orbital_bounds: SMA or inclination range */
|
||
cfg->labelType = FLOAT8OID; /* bin boundary value */
|
||
cfg->leafType = TLE_TYPE_OID; /* full TLE at leaves */
|
||
cfg->canReturnData = true; /* enable index-only scans */
|
||
cfg->longValuesOK = false; /* fixed-size types only */
|
||
|
||
PG_RETURN_VOID();
|
||
}
|
||
```
|
||
|
||
The prefix type stores the range covered by each inner node. The label type stores bin boundary values for child selection. The leaf type is the existing `tle` type — no new type needed at the leaf level.
|
||
|
||
### 5.2 `tle_trie_choose()`
|
||
|
||
Decides which child to descend into during insertion. Level-aware: L0 routes by SMA, L1 routes by inclination.
|
||
|
||
```c
|
||
Datum tle_trie_choose(PG_FUNCTION_ARGS)
|
||
{
|
||
spgChooseIn *in = (spgChooseIn *) PG_GETARG_POINTER(0);
|
||
spgChooseOut *out = (spgChooseOut *) PG_GETARG_POINTER(1);
|
||
|
||
tle_type *tle = DatumGetTleP(in->leafDatum);
|
||
int level = in->level;
|
||
double key;
|
||
|
||
/* Extract the element for this level */
|
||
if (level == 0)
|
||
key = tle_sma_km(tle); /* semi-major axis in km */
|
||
else
|
||
key = tle_inclination_rad(tle); /* inclination in radians */
|
||
|
||
if (in->allTheSame)
|
||
{
|
||
/* All children equivalent — pick first */
|
||
out->resultType = spgMatchNode;
|
||
out->result.matchNode.nodeN = 0;
|
||
out->result.matchNode.levelAdd = 1;
|
||
out->result.matchNode.restDatum = in->leafDatum; /* pass full TLE unchanged */
|
||
PG_RETURN_VOID();
|
||
}
|
||
|
||
/* Find the child whose bin contains our key */
|
||
for (int i = 0; i < in->nNodes; i++)
|
||
{
|
||
double boundary = DatumGetFloat8(in->nodeLabels[i]);
|
||
if (i == in->nNodes - 1 || key < DatumGetFloat8(in->nodeLabels[i + 1]))
|
||
{
|
||
out->resultType = spgMatchNode;
|
||
out->result.matchNode.nodeN = i;
|
||
out->result.matchNode.levelAdd = 1;
|
||
out->result.matchNode.restDatum = in->leafDatum;
|
||
PG_RETURN_VOID();
|
||
}
|
||
}
|
||
|
||
/* Fallback: add new node (should not happen with well-chosen splits) */
|
||
out->resultType = spgAddNode;
|
||
out->result.addNode.nodeLabel = Float8GetDatum(key);
|
||
out->result.addNode.nodeN = in->nNodes;
|
||
PG_RETURN_VOID();
|
||
}
|
||
```
|
||
|
||
**Key detail:** `restDatum = in->leafDatum` — the full TLE passes unchanged to every level, following the quad-tree precedent. The trie terminates at depth 2 (two levels), not by value exhaustion.
|
||
|
||
### 5.3 `tle_trie_picksplit()`
|
||
|
||
Splits a leaf page that has overflowed. Uses equal-population strategy: sort by the current level's element, split at the median entry (not the median value).
|
||
|
||
```c
|
||
Datum tle_trie_picksplit(PG_FUNCTION_ARGS)
|
||
{
|
||
spgPickSplitIn *in = (spgPickSplitIn *) PG_GETARG_POINTER(0);
|
||
spgPickSplitOut *out = (spgPickSplitOut *) PG_GETARG_POINTER(1);
|
||
|
||
int level = in->level;
|
||
int nTuples = in->nTuples;
|
||
|
||
/* Extract sort keys for this level */
|
||
double *keys = palloc(nTuples * sizeof(double));
|
||
int *order = palloc(nTuples * sizeof(int));
|
||
|
||
for (int i = 0; i < nTuples; i++)
|
||
{
|
||
tle_type *tle = DatumGetTleP(in->datums[i]);
|
||
keys[i] = (level == 0) ? tle_sma_km(tle) : tle_inclination_rad(tle);
|
||
order[i] = i;
|
||
}
|
||
|
||
/* Sort by key value */
|
||
qsort_with_keys(order, nTuples, keys); /* stable sort by keys[order[i]] */
|
||
|
||
/* Equal-population split: divide into N bins of ~equal count */
|
||
int nBins = choose_bin_count(nTuples); /* heuristic: sqrt(n), clamped [2, 32] */
|
||
|
||
out->nNodes = nBins;
|
||
out->nodeLabels = palloc(nBins * sizeof(Datum));
|
||
out->mapTuplesToNodes = palloc(nTuples * sizeof(int));
|
||
out->leafTupleDatums = palloc(nTuples * sizeof(Datum));
|
||
|
||
int per_bin = nTuples / nBins;
|
||
for (int bin = 0; bin < nBins; bin++)
|
||
{
|
||
int start = bin * per_bin;
|
||
out->nodeLabels[bin] = Float8GetDatum(keys[order[start]]);
|
||
|
||
int end = (bin == nBins - 1) ? nTuples : (bin + 1) * per_bin;
|
||
for (int j = start; j < end; j++)
|
||
{
|
||
int orig = order[j];
|
||
out->mapTuplesToNodes[orig] = bin;
|
||
out->leafTupleDatums[orig] = in->datums[orig]; /* TLE unchanged */
|
||
}
|
||
}
|
||
|
||
pfree(keys);
|
||
pfree(order);
|
||
PG_RETURN_VOID();
|
||
}
|
||
```
|
||
|
||
### 5.4 `tle_trie_inner_consistent()`
|
||
|
||
The pruning engine. At each inner node, determines which children could contain matching satellites. Carries accumulated bounds in `traversalValue` for cost estimation.
|
||
|
||
```c
|
||
/*
|
||
* Traversal state carried down the tree during index scans.
|
||
* Accumulates bounds for cost estimation and RAAN filter setup.
|
||
*/
|
||
typedef struct OrbitalTraversal {
|
||
double sma_low, sma_high; /* accumulated SMA range from L0 */
|
||
double inc_low, inc_high; /* accumulated inclination range from L1 */
|
||
int32 population; /* estimated objects in this subtree */
|
||
double j2_rate; /* J2 precession rate (computed after L1) */
|
||
} OrbitalTraversal;
|
||
|
||
Datum tle_trie_inner_consistent(PG_FUNCTION_ARGS)
|
||
{
|
||
spgInnerConsistentIn *in = (spgInnerConsistentIn *) PG_GETARG_POINTER(0);
|
||
spgInnerConsistentOut *out = (spgInnerConsistentOut *) PG_GETARG_POINTER(1);
|
||
|
||
int level = in->level;
|
||
|
||
/* Reconstruct or initialize traversal state */
|
||
OrbitalTraversal *trav;
|
||
if (in->traversalValue)
|
||
trav = (OrbitalTraversal *) in->traversalValue;
|
||
else
|
||
{
|
||
trav = palloc0(sizeof(OrbitalTraversal));
|
||
trav->sma_low = 0;
|
||
trav->sma_high = INFINITY;
|
||
trav->inc_low = 0;
|
||
trav->inc_high = M_PI;
|
||
trav->population = -1; /* unknown until leaf scan */
|
||
}
|
||
|
||
/* Extract query parameters from scankey */
|
||
observer_window *qwin = extract_observer_window(in);
|
||
|
||
out->nodeNumbers = palloc(in->nNodes * sizeof(int));
|
||
out->traversalValues = palloc(in->nNodes * sizeof(void *));
|
||
out->nNodes = 0;
|
||
|
||
for (int i = 0; i < in->nNodes; i++)
|
||
{
|
||
double bin_low = DatumGetFloat8(in->nodeLabels[i]);
|
||
double bin_high = (i < in->nNodes - 1)
|
||
? DatumGetFloat8(in->nodeLabels[i + 1])
|
||
: INFINITY;
|
||
bool dominated = false;
|
||
|
||
if (level == 0)
|
||
{
|
||
/* L0: SMA pruning — can a satellite at this altitude be visible? */
|
||
double perigee_alt_km = bin_low - WGS72_AE;
|
||
if (perigee_alt_km > max_visible_altitude(qwin))
|
||
dominated = true; /* too high to be visible */
|
||
}
|
||
else if (level == 1)
|
||
{
|
||
/* L1: Inclination pruning — can this inclination reach observer latitude? */
|
||
double observer_lat = fabs(qwin->observer.lat_rad);
|
||
if (bin_high < observer_lat)
|
||
dominated = true; /* ground track never reaches observer */
|
||
}
|
||
|
||
if (!dominated)
|
||
{
|
||
/* Propagate traversal state to child */
|
||
OrbitalTraversal *child_trav = palloc(sizeof(OrbitalTraversal));
|
||
memcpy(child_trav, trav, sizeof(OrbitalTraversal));
|
||
|
||
if (level == 0) {
|
||
child_trav->sma_low = bin_low;
|
||
child_trav->sma_high = bin_high;
|
||
} else if (level == 1) {
|
||
child_trav->inc_low = bin_low;
|
||
child_trav->inc_high = bin_high;
|
||
/* After L1, we can compute J2 precession rate */
|
||
double a_mid = (child_trav->sma_low + child_trav->sma_high) / 2.0;
|
||
double i_mid = (bin_low + bin_high) / 2.0;
|
||
double n = sqrt(WGS72_MU / (a_mid * a_mid * a_mid));
|
||
child_trav->j2_rate = -1.5 * n * WGS72_J2
|
||
* (WGS72_AE / a_mid) * (WGS72_AE / a_mid)
|
||
* cos(i_mid);
|
||
}
|
||
|
||
int idx = out->nNodes++;
|
||
out->nodeNumbers[idx] = i;
|
||
out->traversalValues[idx] = child_trav;
|
||
}
|
||
}
|
||
|
||
PG_RETURN_VOID();
|
||
}
|
||
```
|
||
|
||
### 5.5 `tle_trie_leaf_consistent()`
|
||
|
||
Final check at the leaf level. Applies the RAAN query-time filter and eccentricity check. Always sets `recheck = true` because SGP4 propagation is still needed for the definitive answer.
|
||
|
||
```c
|
||
Datum tle_trie_leaf_consistent(PG_FUNCTION_ARGS)
|
||
{
|
||
spgLeafConsistentIn *in = (spgLeafConsistentIn *) PG_GETARG_POINTER(0);
|
||
spgLeafConsistentOut *out = (spgLeafConsistentOut *) PG_GETARG_POINTER(1);
|
||
|
||
tle_type *tle = DatumGetTleP(in->leafDatum);
|
||
observer_window *qwin = extract_observer_window_from_scankey(in);
|
||
|
||
/* RAAN query-time filter */
|
||
double dt_days = (qwin->t_mid_jd - tle_epoch_jd(tle));
|
||
double raan_now = tle_raan_rad(tle) + tle_j2_raan_rate(tle) * dt_days;
|
||
raan_now = fmod(raan_now, 2.0 * M_PI);
|
||
if (raan_now < 0) raan_now += 2.0 * M_PI;
|
||
|
||
/* Observer sidereal position at query midpoint */
|
||
double lst = local_sidereal_time(qwin->observer.lon_rad, qwin->t_mid_jd);
|
||
|
||
/* RAAN visibility window: Earth rotation during query + ground footprint */
|
||
double earth_rotation = (qwin->t_end_jd - qwin->t_start_jd) * 360.0; /* degrees */
|
||
double footprint = ground_footprint_deg(tle_sma_km(tle), qwin->min_elevation);
|
||
double raan_window_half = (earth_rotation / 2.0 + footprint) * (M_PI / 180.0);
|
||
|
||
double raan_diff = fabs(raan_now - lst);
|
||
if (raan_diff > M_PI) raan_diff = 2.0 * M_PI - raan_diff;
|
||
|
||
if (raan_diff > raan_window_half)
|
||
{
|
||
out->leafValue = in->leafDatum;
|
||
out->recheck = true;
|
||
PG_RETURN_BOOL(false); /* RAAN not aligned — skip this candidate */
|
||
}
|
||
|
||
/* Eccentricity sanity check: highly eccentric orbits need wider altitude band */
|
||
double ecc = tle_eccentricity(tle);
|
||
if (ecc > 0.1)
|
||
{
|
||
double perigee = tle_sma_km(tle) * (1.0 - ecc) - WGS72_AE;
|
||
double apogee = tle_sma_km(tle) * (1.0 + ecc) - WGS72_AE;
|
||
if (perigee > max_visible_altitude(qwin))
|
||
{
|
||
out->leafValue = in->leafDatum;
|
||
out->recheck = true;
|
||
PG_RETURN_BOOL(false);
|
||
}
|
||
}
|
||
|
||
out->leafValue = in->leafDatum;
|
||
out->recheck = true; /* always recheck — SGP4 propagation is the ground truth */
|
||
PG_RETURN_BOOL(true);
|
||
}
|
||
```
|
||
|
||
---
|
||
|
||
## 6. The RAAN Query-Time Filter
|
||
|
||
### Why not a trie level
|
||
|
||
The effective RAAN visibility window depends entirely on the query time window. This table shows why static trie bins can't capture the physics:
|
||
|
||
| Query window | Earth rotation | Effective RAAN window | Candidates eliminated |
|
||
|-------------|---------------|----------------------|----------------------|
|
||
| 30 min | 7.5° | ~52° (14% of sky) | ~85% |
|
||
| 2 hours | 30° | ~74° (21%) | ~79% |
|
||
| 6 hours | 90° | ~134° (37%) | ~63% |
|
||
| 12 hours | 180° | ~224° (62%) | ~38% |
|
||
| 24 hours | 360° | 360° (100%) | 0% |
|
||
|
||
A static bin structure (4 bins of 90° each) could eliminate at most 75% (3 of 4 bins) for a 30-minute query, but eliminates nothing for queries longer than ~6 hours. The query-time filter automatically adapts to the actual window.
|
||
|
||
### The J2 precession rate
|
||
|
||
RAAN precession is the dominant secular perturbation for LEO orbits. The rate is:
|
||
|
||
```
|
||
Ω̇ = -1.5 · n · J₂ · (Rₑ/a)² · cos(i)
|
||
```
|
||
|
||
where:
|
||
- `n = √(μ/a³)` — mean motion (rad/s), with `μ = 398600.8 km³/s²` (WGS-72)
|
||
- `J₂ = 0.001082616` (WGS-72)
|
||
- `Rₑ = 6378.135 km` (WGS-72)
|
||
- `a` — semi-major axis (km)
|
||
- `i` — inclination (radians)
|
||
|
||
For a 400 km LEO satellite at 51.6° inclination (ISS-like):
|
||
```
|
||
a = 6778.135 km
|
||
n = 0.00114 rad/s
|
||
Ω̇ = -1.5 × 0.00114 × 0.001082616 × (6378.135/6778.135)² × cos(51.6°)
|
||
≈ -1.07 × 10⁻⁶ rad/s ≈ -5.3°/day
|
||
```
|
||
|
||
This rate is fully determined by SMA and inclination — both already indexed at L0 and L1.
|
||
|
||
### Per-candidate cost
|
||
|
||
Projecting RAAN to query time requires:
|
||
1. One subtraction (epoch difference)
|
||
2. One multiply (rate × time)
|
||
3. One addition (RAAN₀ + drift)
|
||
4. One modulo (wrap to [0, 2π))
|
||
5. One range check (within visibility window?)
|
||
|
||
**Cost: ~10 nanoseconds per candidate.**
|
||
|
||
After L0 and L1 pruning, a typical pass prediction query over 30,000 TLEs leaves ~4,500 candidates. The total RAAN filter cost:
|
||
|
||
```
|
||
4,500 × 10 ns = 45 μs
|
||
```
|
||
|
||
This is negligible compared to SGP4 propagation cost for the survivors (typically hundreds of milliseconds to seconds). The filter eliminates ~79% of those 4,500 candidates (for a 2-hour window), leaving ~945 for SGP4 propagation instead of 4,500.
|
||
|
||
---
|
||
|
||
## 7. Propagation-Aware Cost Estimation
|
||
|
||
### The core innovation
|
||
|
||
Standard PostgreSQL index cost estimators model I/O cost: pages read, tuples fetched. KTrie's original insight — preserved in this SP-GiST design — is to model **downstream computation cost**. The expensive operation in satellite queries isn't reading data; it's SGP4/SDP4 propagation.
|
||
|
||
The cost estimator tells the query planner:
|
||
|
||
```
|
||
estimated_cost = surviving_population × time_steps × sgp4_cost_per_step
|
||
```
|
||
|
||
This lets the planner compare "index scan + 945 SGP4 evaluations" vs "sequential scan + 30,000 SGP4 evaluations" — a decision no other PostgreSQL index type can make.
|
||
|
||
### Population tracking via `traversalValue`
|
||
|
||
SP-GiST's `traversalValue` mechanism carries the `OrbitalTraversal` struct down the tree during scans. At each inner node, the cost estimator accumulates:
|
||
|
||
```c
|
||
typedef struct OrbitalTraversal {
|
||
double sma_low, sma_high; /* from L0 */
|
||
double inc_low, inc_high; /* from L1 */
|
||
int32 population; /* objects in this subtree */
|
||
double j2_rate; /* derived from L0 + L1 midpoints */
|
||
} OrbitalTraversal;
|
||
```
|
||
|
||
The `population` field counts objects in each subtree. After L1, `j2_rate` is computed from the accumulated SMA and inclination bounds. This enables the RAAN filter to predict how many candidates will survive:
|
||
|
||
```
|
||
expected_visible = population × (RAAN_window / 360°)
|
||
```
|
||
|
||
### Constellation detection as emergent property
|
||
|
||
Mega-constellations cluster tightly in SMA × inclination space. Starlink shell 1: ~2,000 satellites at 550 km / 53.0°. All members share nearly the same J2 precession rate, and their RAANs are evenly distributed by constellation design (phased orbital planes).
|
||
|
||
The trie naturally groups these into a single L1 subtree. The cost estimator sees a tight population with a uniform RAAN distribution and can predict:
|
||
|
||
```
|
||
Starlink shell 1 example (2-hour query from Eagle, Idaho):
|
||
population = 2,000
|
||
RAAN_window = 74° → fraction = 20.6%
|
||
RAAN_survivors = 2,000 × 0.206 = ~412
|
||
time_steps = 120 (2 hours at 60-second intervals)
|
||
sgp4_per_step = 0.05 ms
|
||
estimated_cost = 412 × 120 × 0.05 ms = 2.5 seconds
|
||
```
|
||
|
||
No explicit constellation classification is needed. The structure emerges from orbital mechanics.
|
||
|
||
### Custom statistics function
|
||
|
||
The cost estimator can be registered as a custom statistics function for the SP-GiST operator class, or integrated into the `inner_consistent` function via `traversalValue`. The planner sees accurate per-subtree cost predictions without any special configuration:
|
||
|
||
```c
|
||
void spgist_tle_cost_estimate(PlannerInfo *root, IndexPath *path,
|
||
double loop_count, Cost *startup_cost,
|
||
Cost *total_cost, Selectivity *selectivity,
|
||
double *correlation, double *index_pages)
|
||
{
|
||
double surviving_pop = estimate_from_traversal(root, path);
|
||
double time_steps = extract_query_window_steps(path);
|
||
double sgp4_cost = 0.05; /* ms per propagation step, calibratable */
|
||
|
||
double raan_fraction = estimate_raan_survival(path);
|
||
double propagation_candidates = surviving_pop * raan_fraction;
|
||
|
||
*total_cost = (2 * PAGE_READ_COST) /* L0 + L1 traversal */
|
||
+ (surviving_pop * RAAN_FILTER_COST) /* query-time RAAN */
|
||
+ (propagation_candidates * time_steps * sgp4_cost) /* SGP4 */
|
||
+ (propagation_candidates * HEAP_FETCH_COST); /* fetch full TLE */
|
||
|
||
*selectivity = propagation_candidates / total_catalog_size;
|
||
}
|
||
```
|
||
|
||
---
|
||
|
||
## 8. Operators
|
||
|
||
### Types
|
||
|
||
```sql
|
||
-- Observation query parameters bundled as a composite type
|
||
CREATE TYPE observer_window AS (
|
||
obs observer, -- existing pg_orrery observer type (lat, lon, alt_m)
|
||
t_start timestamptz, -- query window start
|
||
t_end timestamptz, -- query window end
|
||
min_el float8 -- minimum elevation angle in degrees
|
||
);
|
||
```
|
||
|
||
### Operator definitions
|
||
|
||
Three operators, starting minimal. The flagship operator `&?` is the primary interface for pass prediction.
|
||
|
||
```sql
|
||
-- Strategy 1: Altitude regime containment
|
||
-- "Is this TLE's orbit within this altitude range?"
|
||
CREATE OPERATOR @> (
|
||
LEFTARG = float8range, -- altitude range in km (e.g., '[200,600]')
|
||
RIGHTARG = tle,
|
||
PROCEDURE = tle_regime_contains,
|
||
COMMUTATOR = <@
|
||
);
|
||
|
||
-- Strategy 2: Orbital envelope overlap (enhanced existing &&)
|
||
-- "Do these two TLEs share overlapping altitude + inclination space?"
|
||
CREATE OPERATOR && (
|
||
LEFTARG = tle,
|
||
RIGHTARG = tle,
|
||
PROCEDURE = tle_envelope_overlaps,
|
||
COMMUTATOR = &&
|
||
);
|
||
|
||
-- Strategy 3: Visibility cone check (flagship operator)
|
||
-- "Could this satellite be visible from this observer during this time window?"
|
||
-- Combines SMA pruning (L0) + inclination pruning (L1) + RAAN query-time filter
|
||
CREATE OPERATOR &? (
|
||
LEFTARG = observer_window,
|
||
RIGHTARG = tle,
|
||
PROCEDURE = tle_visibility_possible
|
||
);
|
||
```
|
||
|
||
### Example queries
|
||
|
||
```sql
|
||
-- Primary use case: pass prediction
|
||
-- "Which satellites might be visible from Eagle, Idaho in the next 2 hours?"
|
||
SELECT s.norad_id, s.name
|
||
FROM satellites s
|
||
WHERE ROW(
|
||
observer('43.7N 116.4W 760m'),
|
||
now(),
|
||
now() + interval '2 hours',
|
||
10.0
|
||
)::observer_window &? s.tle;
|
||
|
||
-- Altitude regime query
|
||
-- "Which satellites orbit between 400 and 600 km?"
|
||
SELECT s.norad_id, s.name
|
||
FROM satellites s
|
||
WHERE '[400,600]'::float8range @> s.tle;
|
||
|
||
-- Combined: visible LEO passes with full SGP4 propagation
|
||
SELECT s.norad_id, s.name,
|
||
predict_passes(s.tle, observer('43.7N 116.4W 760m'),
|
||
now(), now() + interval '2 hours', 10.0)
|
||
FROM satellites s
|
||
WHERE ROW(
|
||
observer('43.7N 116.4W 760m'),
|
||
now(),
|
||
now() + interval '2 hours',
|
||
10.0
|
||
)::observer_window &? s.tle;
|
||
```
|
||
|
||
The `&?` operator returns `true` for satellites that **might** be visible (a superset of the actual answer). The `predict_passes()` function then runs SGP4 propagation on only those candidates. The index's job is to minimize the candidate set, not to produce the final answer.
|
||
|
||
---
|
||
|
||
## 9. Data Structures
|
||
|
||
### SP-GiST node structure
|
||
|
||
SP-GiST manages its own page layout, inner tuples, and leaf tuples. We declare our types; the framework handles storage:
|
||
|
||
- **Prefix type:** `float8range` — the SMA or inclination range covered by an inner node
|
||
- **Label type:** `float8` — bin boundary values for child dispatch
|
||
- **Leaf type:** `tle` — the existing pg_orrery TLE type (112 bytes, `STORAGE = plain`)
|
||
|
||
No new page format. No custom WAL records. SP-GiST provides all of this.
|
||
|
||
### `observer_window` composite type
|
||
|
||
```c
|
||
/* Not a new C struct — uses SQL composite type mechanics */
|
||
/* Fields: observer (24B) + t_start (8B) + t_end (8B) + min_el (8B) = 48 bytes */
|
||
```
|
||
|
||
This is a standard SQL composite type, not a custom C type. PostgreSQL handles I/O, storage, and parameter passing. The operator functions extract fields via `GetAttributeByNum()`.
|
||
|
||
### WGS-72 constants
|
||
|
||
All internal computations use the same WGS-72 constants already defined in pg_orrery's `types.h`:
|
||
|
||
```c
|
||
#define WGS72_MU 398600.8 /* km³/s² */
|
||
#define WGS72_AE 6378.135 /* km */
|
||
#define WGS72_J2 0.001082616
|
||
```
|
||
|
||
The constant chain of custody is maintained: WGS-72 for orbital mechanics, WGS-84 for observer coordinate output. No new constants introduced.
|
||
|
||
---
|
||
|
||
## 10. What Changed from the Original KTrie
|
||
|
||
| Original KTrie | Evolved SP-GiST | Rationale |
|
||
|----------------|-----------------|-----------|
|
||
| Custom index AM (12+ functions) | SP-GiST (5 functions) | PostgreSQL handles WAL, VACUUM, parallel, recovery |
|
||
| 5-level trie (a, i, Ω, e, ω) | 2-level trie (a, i) | Only SMA + inclination are temporally stable |
|
||
| RAAN as L2 (4-8 static bins) | Query-time analytical filter | Time-dependent; adapts to window; no reindex |
|
||
| Eccentricity as L3 | Query-time check in `leaf_consistent` | Near-zero in LEO; unreliable for fine pruning |
|
||
| Arg. perigee as L4 | Dropped | Near-zero discrimination in LEO |
|
||
| Patricia compression (custom) | SP-GiST prefix compression (built-in) | Same semantics, zero custom page management |
|
||
| `uint16` population (max 65,535) | `int32` via `traversalValue` | No overflow risk; Starlink alone targets 12,000+ |
|
||
| 6 operators | 3 operators | Start minimal; `&?` is the flagship |
|
||
| 10+ C source files | ~3 C source files | SP-GiST handles infrastructure |
|
||
| Custom page layout (72B header) | SP-GiST page layout | No custom page management |
|
||
| Custom split/merge logic | SP-GiST `picksplit` | Framework handles page operations |
|
||
| Manual compression triggers | SP-GiST automatic prefix handling | Framework compresses single-child paths |
|
||
| `reindex_raan_drift` parameter | Eliminated | RAAN not in the trie; no reindex needed |
|
||
|
||
### What survived intact
|
||
|
||
Three ideas from the original spec are preserved without modification:
|
||
|
||
1. **Population-aware cost estimation** — the core innovation. Now delivered via `traversalValue` instead of `uint16` per-entry fields.
|
||
2. **Equal-population splits** — density-balanced tree, not geometry-balanced. Now in `picksplit` instead of custom split logic.
|
||
3. **WGS-72 constant chain of custody** — unchanged, inherited from pg_orrery.
|
||
|
||
---
|
||
|
||
## 11. Implementation Roadmap
|
||
|
||
### Phase 1: SP-GiST prototype
|
||
|
||
Create the core SP-GiST index with the 5 support functions and 3 operators.
|
||
|
||
**Files:**
|
||
```
|
||
src/spgist_tle.c — config, choose, picksplit, inner_consistent, leaf_consistent
|
||
src/visibility_ops.c — &? operator, RAAN filter, observer_window type support
|
||
```
|
||
|
||
**SQL:**
|
||
```
|
||
sql/pg_orrery--0.6.0--0.7.0.sql — types, operators, SP-GiST operator class
|
||
```
|
||
|
||
**Tests:**
|
||
```
|
||
test/sql/spgist_tle.sql — index creation, operator tests, pruning validation
|
||
test/expected/spgist_tle.out
|
||
```
|
||
|
||
**Goal:** Working index that correctly prunes by SMA + inclination + RAAN. Correctness over performance.
|
||
|
||
### Phase 2: Cost estimator
|
||
|
||
Add the propagation-aware cost estimation function.
|
||
|
||
**Files:**
|
||
```
|
||
src/spgist_cost.c — custom cost estimator with population tracking
|
||
```
|
||
|
||
**Goal:** Query planner correctly chooses index scan vs sequential scan based on predicted SGP4 cost.
|
||
|
||
### Phase 3: Benchmark vs current GiST
|
||
|
||
Load 30,000 TLEs from Space-Track. Compare:
|
||
|
||
| Metric | Current GiST | SP-GiST orbital trie |
|
||
|--------|-------------|---------------------|
|
||
| Pruning rate (2h window, 43.7° lat) | Measure | Measure |
|
||
| Pruning rate (24h window, equator) | Measure | Measure |
|
||
| Query time (pass prediction) | Measure | Measure |
|
||
| False positive rate | Measure | Measure |
|
||
| Index size | Measure | Measure |
|
||
| Build time | Measure | Measure |
|
||
|
||
### Phase 4: Evaluate
|
||
|
||
If the SP-GiST trie achieves >85% pruning on the benchmark suite, ship as pg_orrery v0.7.0. If not, analyze where candidates survive and determine whether deeper trie levels or alternative strategies are justified.
|
||
|
||
**Decision gate:** The GiST enhancement (adding eccentricity to the existing key) provides the baseline. The SP-GiST trie must demonstrably exceed it to justify the additional code.
|
||
|
||
---
|
||
|
||
## 12. Open Questions
|
||
|
||
1. **Should `observer_window` be a new custom type or a SQL composite?**
|
||
A custom C type (like `observer`) gives tighter control and avoids composite type overhead. A SQL composite is simpler to implement and extend. The composite is sufficient for the prototype; migrate to custom C type if profiling shows overhead.
|
||
|
||
2. **Can we store population metadata in SP-GiST's prefix datum?**
|
||
The prefix is `float8range` (16 bytes). Population could be packed into a custom prefix struct instead, but this couples the prefix to the cost estimator. Using `traversalValue` keeps them separate. Needs testing to determine if `traversalValue` introduces measurable overhead.
|
||
|
||
3. **What's the right default picksplit strategy?**
|
||
Options: median-of-element-value (equal-range) vs median-of-population (equal-count). Equal-count is theoretically better for balanced query cost, but equal-range is simpler and may perform similarly when the catalog distribution is stable. Implement equal-count, benchmark against equal-range.
|
||
|
||
4. **Should the cost estimator calibrate `sgp4_cost_per_step` at index creation time?**
|
||
Running a small SGP4 benchmark during `CREATE INDEX` would give an accurate per-machine calibration. But it couples index creation to runtime performance, and the cost varies with TLE characteristics (near-earth vs deep-space). A configurable GUC (`pg_orrery.sgp4_cost_us`, default 50) may be more practical.
|
||
|
||
5. **What about conjunction screening (N×N)?**
|
||
The original spec included `&=` (conjunction possible). This is a different query pattern: orbit-to-orbit, not observer-to-orbit. The SP-GiST trie can support it (L0+L1 pruning applies), but the RAAN filter doesn't (no observer). Defer to a future phase.
|
||
|
||
6. **Index-only scans: what data is needed?**
|
||
`canReturnData = true` enables index-only scans, avoiding heap fetches. The leaf stores the full `tle` type. If the query only needs NORAD ID and basic orbital parameters (common for pass prediction pre-screening), the index can serve the query without touching the heap. Verify this works correctly with the `STORAGE = plain` TLE type.
|