pg_orrery/docs/ktrie/SP-GIST-ORBITAL-TRIE.md
Ryan Malloy bb235f51fa Add SP-GiST orbital trie design spec for satellite pass prediction index
Evolved from the original KTrie custom AM proposal (preserved as
KTRIE-SPEC-ORIGINAL.md). Key design decisions: 2-level trie (SMA +
inclination) instead of 5, SP-GiST framework instead of custom AM,
query-time RAAN filter instead of trie level, propagation-aware cost
estimation via traversalValue.
2026-02-17 19:53:42 -07:00

34 KiB
Raw Blame History

SP-GiST Orbital Trie: Domain-Specific Index for Satellite Pass Prediction

1. Purpose & Lineage

This document specifies a domain-specific SP-GiST index for accelerating satellite pass prediction queries in PostgreSQL. The index decomposes TLE orbital element space into a 2-level hierarchical trie — semi-major axis and inclination — with a query-time RAAN filter and propagation-aware cost estimation.

Primary use case: 1-to-N pass prediction. "Which of 30,000 satellites are visible from this observer in the next 2 hours?" The index prunes the catalog analytically before any SGP4/SDP4 propagation, which is the dominant query cost.

Lineage: This design evolved from the original KTrie spec (KTRIE-SPEC-ORIGINAL.md) through design review. The original proposed a fully custom PostgreSQL index access method with 5 Keplerian element levels. Analysis showed:

  1. Only 2 of 5 levels (SMA, inclination) are temporally stable enough to index
  2. PostgreSQL's SP-GiST framework provides all necessary infrastructure (WAL, VACUUM, parallel scan, prefix compression)
  3. RAAN filtering is better as a query-time analytical filter that adapts to the query window
  4. Constellation-level pruning emerges naturally from the trie structure without explicit classification
  5. The real innovation — propagation-aware cost estimation — works within SP-GiST via traversalValue

The React visualization (ktrie-layout.jsx) reflects the original 5-level design and will be updated separately.


2. The Visibility Decision Tree

Pass prediction asks: "Can satellite S be seen from observer O during time window [t₁, t₂]?" There are exactly four questions, ordered by decreasing temporal stability:

Q1: Can the orbit reach observable altitude?

Determined by perigee and apogee altitude, derived from SMA and eccentricity via Kepler's third law.

  • Stability: Very stable. Changes only with atmospheric drag (slow decay) or powered maneuvers (rare, discrete events).
  • Discrimination: Eliminates entire orbital regimes. An observer with a 10° minimum elevation can see satellites up to ~2,500 km altitude. This removes all MEO, GEO, and beyond — roughly 25% of the catalog.
  • Verdict: INDEXABLE.

Q2: Can the ground track reach my latitude?

A satellite with inclination i has a ground track bounded by latitudes [-i, +i]. An observer at latitude φ can only see satellites where i ≥ |φ| (simplified; the actual constraint includes off-track visibility angle, which depends on altitude).

  • Stability: Very stable. Inclination barely changes without thrust.
  • Discrimination: An observer at 43.7° (Eagle, Idaho) eliminates all equatorial and low-inclination objects — roughly 60% of LEO.
  • Verdict: INDEXABLE.

Q3: Is the orbital plane aligned with my location?

The Right Ascension of the Ascending Node (RAAN, Ω) determines where the orbit crosses the equator in inertial space. A satellite is potentially overhead when its RAAN is roughly aligned with the observer's current sidereal position.

  • Stability: UNSTABLE. RAAN precesses at 0.57°/day due to J2 oblateness perturbation. The precession rate is:

    Ω̇ = -1.5 · n · J₂ · (Rₑ/a)² · cos(i)
    

    where n is mean motion, J₂ = 0.001082616, Rₑ = 6378.135 km (WGS-72), and a is semi-major axis.

  • Key insight: Ω̇ is fully determined by SMA and inclination — the two elements already indexed at L0 and L1. The RAAN check requires only the TLE's stored RAAN₀ and epoch (available in the leaf) plus the elements already traversed. No additional index level is needed.

  • Verdict: QUERY-TIME FILTER. Computed per surviving candidate after L0+L1 pruning.

Q4: Is the satellite at the right orbital phase?

Mean anomaly changes at ~4°/second for LEO. The satellite's actual position along its orbit at any given moment is the irreducible question that requires SGP4/SDP4 numerical propagation.

  • Stability: COMPLETELY UNSTABLE. Changes continuously.
  • Verdict: IRREDUCIBLE. Requires SGP4 propagation.

Conclusion

Only two dimensions are worth indexing. Everything else is either computable from those two dimensions (RAAN) or requires full propagation (mean anomaly). This drives the 2-level trie design.


3. Why SP-GiST

SP-GiST vs custom AM vs enhanced GiST

Requirement SP-GiST Custom AM Enhanced GiST
Page management + WAL Free Must implement Free
VACUUM + dead tuple cleanup Free Must implement Free
Crash recovery Free Must implement Free
Parallel scan Free Must implement Free
Prefix compression Built-in (prefix/suffix) Must implement Not available
Non-balanced tree Native Must implement Not natural
Fixed level decomposition Via level counter Must implement Not available
Traversal state for cost est. traversalValue Must implement Not available
Node labels for children nodeLabels Must implement Not available
Support functions needed 5 12+ 7 (already exist)
Lines of domain-specific C ~800 est. ~3,000+ est. ~200 (delta)

SP-GiST was literally designed for this kind of data structure. Its space-partitioning model matches the KTrie concept directly: fixed decomposition rules, non-balanced trees, prefix compression. The traversalValue mechanism carries state down the tree during scans — exactly what population-aware cost estimation needs.

The quad-tree precedent

PostgreSQL's built-in SP-GiST quad-tree (spgquadtreeproc.c) demonstrates a key pattern: the restDatum (remaining datum after prefix extraction) can be the same type at every level. The quad-tree passes the full point unchanged:

out->result.matchNode.restDatum = in->datum;

The tree terminates by depth, not by value exhaustion. Our design does the same — the leaf stores the full tle type. No intermediate compressed struct needed. The leaf has everything for the RAAN query-time filter (RAAN₀, epoch, mean_motion, inclination, eccentricity).

What SP-GiST's prefix compression gives us

The original KTrie spec implemented Patricia compression manually (compressed nodes, skip_bounds arrays, compression/decompression triggers). SP-GiST provides this natively: the config function declares a prefix type, and the choose function returns prefix/suffix decompositions. Single-child chains are compressed automatically by the framework. Same semantics, zero custom page management.


4. Architecture

2-level trie with query-time filters

SP-GiST L0: Semi-Major Axis (altitude regime)
    ├── Equal-population splits, not equal-range
    ├── Density-balanced: LEO (75% of catalog) gets finer bins
    └── Prunes by perigee/apogee altitude reachability

SP-GiST L1: Inclination
    ├── Equal-population splits
    ├── Natural clustering at launch-site latitudes (28.5°, 51.6°, 97°)
    └── Prunes by ground-track latitude coverage

Query-time RAAN filter (in leaf_consistent):
    ├── Projects RAAN to query midpoint via J2 precession rate
    ├── Checks alignment with observer's sidereal position
    ├── Adapts automatically to any query window length
    └── Cost: ~10ns per candidate, ~45μs for entire post-L0/L1 batch

SGP4/SDP4 propagation (irreducible):
    └── Full numerical propagation for surviving candidates

Equal-population splits

The current GiST uses equal-range splits (median midpoint of the geometric spread). Equal-population splits keep the tree balanced in object density, not geometric space. The altitude band 300600 km (where Starlink, ISS, and most LEO debris live) gets finer subdivision than 6002,000 km. This matters because:

  • Query cost is proportional to surviving population, not geometric volume
  • Dense regions need finer discrimination; sparse regions don't benefit from it
  • The cost estimator's population predictions are more accurate with balanced subtrees

Constellation-aware pruning

Mega-constellations cluster tightly in L0+L1 space. Starlink shell 1: all ~2,000 satellites at 550 km / 53.0°. OneWeb: ~600 at 1,200 km / 87.9°. These clusters naturally fall into single L1 subtrees, giving the cost estimator a tight population count and a narrow J2 precession rate range. Constellation-level pruning emerges from the trie structure without explicit constellation classification.


5. SP-GiST Support Functions

Five functions implement the index. All use WGS-72 constants internally, consistent with pg_orrery's constant chain of custody.

5.1 tle_trie_config()

Declares the type system for the trie.

Datum tle_trie_config(PG_FUNCTION_ARGS)
{
    spgConfigOut *cfg = (spgConfigOut *) palloc0(sizeof(spgConfigOut));

    cfg->prefixType   = FLOAT8RANGEOID;   /* orbital_bounds: SMA or inclination range */
    cfg->labelType    = FLOAT8OID;        /* bin boundary value */
    cfg->leafType     = TLE_TYPE_OID;     /* full TLE at leaves */
    cfg->canReturnData = true;            /* enable index-only scans */
    cfg->longValuesOK  = false;           /* fixed-size types only */

    PG_RETURN_VOID();
}

The prefix type stores the range covered by each inner node. The label type stores bin boundary values for child selection. The leaf type is the existing tle type — no new type needed at the leaf level.

5.2 tle_trie_choose()

Decides which child to descend into during insertion. Level-aware: L0 routes by SMA, L1 routes by inclination.

Datum tle_trie_choose(PG_FUNCTION_ARGS)
{
    spgChooseIn  *in  = (spgChooseIn *)  PG_GETARG_POINTER(0);
    spgChooseOut *out = (spgChooseOut *) PG_GETARG_POINTER(1);

    tle_type *tle = DatumGetTleP(in->leafDatum);
    int level = in->level;
    double key;

    /* Extract the element for this level */
    if (level == 0)
        key = tle_sma_km(tle);          /* semi-major axis in km */
    else
        key = tle_inclination_rad(tle); /* inclination in radians */

    if (in->allTheSame)
    {
        /* All children equivalent — pick first */
        out->resultType = spgMatchNode;
        out->result.matchNode.nodeN = 0;
        out->result.matchNode.levelAdd = 1;
        out->result.matchNode.restDatum = in->leafDatum;  /* pass full TLE unchanged */
        PG_RETURN_VOID();
    }

    /* Find the child whose bin contains our key */
    for (int i = 0; i < in->nNodes; i++)
    {
        double boundary = DatumGetFloat8(in->nodeLabels[i]);
        if (i == in->nNodes - 1 || key < DatumGetFloat8(in->nodeLabels[i + 1]))
        {
            out->resultType = spgMatchNode;
            out->result.matchNode.nodeN = i;
            out->result.matchNode.levelAdd = 1;
            out->result.matchNode.restDatum = in->leafDatum;
            PG_RETURN_VOID();
        }
    }

    /* Fallback: add new node (should not happen with well-chosen splits) */
    out->resultType = spgAddNode;
    out->result.addNode.nodeLabel = Float8GetDatum(key);
    out->result.addNode.nodeN = in->nNodes;
    PG_RETURN_VOID();
}

Key detail: restDatum = in->leafDatum — the full TLE passes unchanged to every level, following the quad-tree precedent. The trie terminates at depth 2 (two levels), not by value exhaustion.

5.3 tle_trie_picksplit()

Splits a leaf page that has overflowed. Uses equal-population strategy: sort by the current level's element, split at the median entry (not the median value).

Datum tle_trie_picksplit(PG_FUNCTION_ARGS)
{
    spgPickSplitIn  *in  = (spgPickSplitIn *)  PG_GETARG_POINTER(0);
    spgPickSplitOut *out = (spgPickSplitOut *) PG_GETARG_POINTER(1);

    int level = in->level;
    int nTuples = in->nTuples;

    /* Extract sort keys for this level */
    double *keys = palloc(nTuples * sizeof(double));
    int    *order = palloc(nTuples * sizeof(int));

    for (int i = 0; i < nTuples; i++)
    {
        tle_type *tle = DatumGetTleP(in->datums[i]);
        keys[i] = (level == 0) ? tle_sma_km(tle) : tle_inclination_rad(tle);
        order[i] = i;
    }

    /* Sort by key value */
    qsort_with_keys(order, nTuples, keys);  /* stable sort by keys[order[i]] */

    /* Equal-population split: divide into N bins of ~equal count */
    int nBins = choose_bin_count(nTuples);   /* heuristic: sqrt(n), clamped [2, 32] */

    out->nNodes = nBins;
    out->nodeLabels = palloc(nBins * sizeof(Datum));
    out->mapTuplesToNodes = palloc(nTuples * sizeof(int));
    out->leafTupleDatums = palloc(nTuples * sizeof(Datum));

    int per_bin = nTuples / nBins;
    for (int bin = 0; bin < nBins; bin++)
    {
        int start = bin * per_bin;
        out->nodeLabels[bin] = Float8GetDatum(keys[order[start]]);

        int end = (bin == nBins - 1) ? nTuples : (bin + 1) * per_bin;
        for (int j = start; j < end; j++)
        {
            int orig = order[j];
            out->mapTuplesToNodes[orig] = bin;
            out->leafTupleDatums[orig] = in->datums[orig];  /* TLE unchanged */
        }
    }

    pfree(keys);
    pfree(order);
    PG_RETURN_VOID();
}

5.4 tle_trie_inner_consistent()

The pruning engine. At each inner node, determines which children could contain matching satellites. Carries accumulated bounds in traversalValue for cost estimation.

/*
 * Traversal state carried down the tree during index scans.
 * Accumulates bounds for cost estimation and RAAN filter setup.
 */
typedef struct OrbitalTraversal {
    double  sma_low, sma_high;    /* accumulated SMA range from L0 */
    double  inc_low, inc_high;    /* accumulated inclination range from L1 */
    int32   population;           /* estimated objects in this subtree */
    double  j2_rate;              /* J2 precession rate (computed after L1) */
} OrbitalTraversal;

Datum tle_trie_inner_consistent(PG_FUNCTION_ARGS)
{
    spgInnerConsistentIn  *in  = (spgInnerConsistentIn *)  PG_GETARG_POINTER(0);
    spgInnerConsistentOut *out = (spgInnerConsistentOut *) PG_GETARG_POINTER(1);

    int level = in->level;

    /* Reconstruct or initialize traversal state */
    OrbitalTraversal *trav;
    if (in->traversalValue)
        trav = (OrbitalTraversal *) in->traversalValue;
    else
    {
        trav = palloc0(sizeof(OrbitalTraversal));
        trav->sma_low = 0;
        trav->sma_high = INFINITY;
        trav->inc_low = 0;
        trav->inc_high = M_PI;
        trav->population = -1;  /* unknown until leaf scan */
    }

    /* Extract query parameters from scankey */
    observer_window *qwin = extract_observer_window(in);

    out->nodeNumbers = palloc(in->nNodes * sizeof(int));
    out->traversalValues = palloc(in->nNodes * sizeof(void *));
    out->nNodes = 0;

    for (int i = 0; i < in->nNodes; i++)
    {
        double bin_low = DatumGetFloat8(in->nodeLabels[i]);
        double bin_high = (i < in->nNodes - 1)
                        ? DatumGetFloat8(in->nodeLabels[i + 1])
                        : INFINITY;
        bool dominated = false;

        if (level == 0)
        {
            /* L0: SMA pruning — can a satellite at this altitude be visible? */
            double perigee_alt_km = bin_low - WGS72_AE;
            if (perigee_alt_km > max_visible_altitude(qwin))
                dominated = true;  /* too high to be visible */
        }
        else if (level == 1)
        {
            /* L1: Inclination pruning — can this inclination reach observer latitude? */
            double observer_lat = fabs(qwin->observer.lat_rad);
            if (bin_high < observer_lat)
                dominated = true;  /* ground track never reaches observer */
        }

        if (!dominated)
        {
            /* Propagate traversal state to child */
            OrbitalTraversal *child_trav = palloc(sizeof(OrbitalTraversal));
            memcpy(child_trav, trav, sizeof(OrbitalTraversal));

            if (level == 0) {
                child_trav->sma_low = bin_low;
                child_trav->sma_high = bin_high;
            } else if (level == 1) {
                child_trav->inc_low = bin_low;
                child_trav->inc_high = bin_high;
                /* After L1, we can compute J2 precession rate */
                double a_mid = (child_trav->sma_low + child_trav->sma_high) / 2.0;
                double i_mid = (bin_low + bin_high) / 2.0;
                double n = sqrt(WGS72_MU / (a_mid * a_mid * a_mid));
                child_trav->j2_rate = -1.5 * n * WGS72_J2
                                    * (WGS72_AE / a_mid) * (WGS72_AE / a_mid)
                                    * cos(i_mid);
            }

            int idx = out->nNodes++;
            out->nodeNumbers[idx] = i;
            out->traversalValues[idx] = child_trav;
        }
    }

    PG_RETURN_VOID();
}

5.5 tle_trie_leaf_consistent()

Final check at the leaf level. Applies the RAAN query-time filter and eccentricity check. Always sets recheck = true because SGP4 propagation is still needed for the definitive answer.

Datum tle_trie_leaf_consistent(PG_FUNCTION_ARGS)
{
    spgLeafConsistentIn  *in  = (spgLeafConsistentIn *)  PG_GETARG_POINTER(0);
    spgLeafConsistentOut *out = (spgLeafConsistentOut *) PG_GETARG_POINTER(1);

    tle_type *tle = DatumGetTleP(in->leafDatum);
    observer_window *qwin = extract_observer_window_from_scankey(in);

    /* RAAN query-time filter */
    double dt_days = (qwin->t_mid_jd - tle_epoch_jd(tle));
    double raan_now = tle_raan_rad(tle) + tle_j2_raan_rate(tle) * dt_days;
    raan_now = fmod(raan_now, 2.0 * M_PI);
    if (raan_now < 0) raan_now += 2.0 * M_PI;

    /* Observer sidereal position at query midpoint */
    double lst = local_sidereal_time(qwin->observer.lon_rad, qwin->t_mid_jd);

    /* RAAN visibility window: Earth rotation during query + ground footprint */
    double earth_rotation = (qwin->t_end_jd - qwin->t_start_jd) * 360.0;  /* degrees */
    double footprint = ground_footprint_deg(tle_sma_km(tle), qwin->min_elevation);
    double raan_window_half = (earth_rotation / 2.0 + footprint) * (M_PI / 180.0);

    double raan_diff = fabs(raan_now - lst);
    if (raan_diff > M_PI) raan_diff = 2.0 * M_PI - raan_diff;

    if (raan_diff > raan_window_half)
    {
        out->leafValue = in->leafDatum;
        out->recheck = true;
        PG_RETURN_BOOL(false);  /* RAAN not aligned — skip this candidate */
    }

    /* Eccentricity sanity check: highly eccentric orbits need wider altitude band */
    double ecc = tle_eccentricity(tle);
    if (ecc > 0.1)
    {
        double perigee = tle_sma_km(tle) * (1.0 - ecc) - WGS72_AE;
        double apogee  = tle_sma_km(tle) * (1.0 + ecc) - WGS72_AE;
        if (perigee > max_visible_altitude(qwin))
        {
            out->leafValue = in->leafDatum;
            out->recheck = true;
            PG_RETURN_BOOL(false);
        }
    }

    out->leafValue = in->leafDatum;
    out->recheck = true;  /* always recheck — SGP4 propagation is the ground truth */
    PG_RETURN_BOOL(true);
}

6. The RAAN Query-Time Filter

Why not a trie level

The effective RAAN visibility window depends entirely on the query time window. This table shows why static trie bins can't capture the physics:

Query window Earth rotation Effective RAAN window Candidates eliminated
30 min 7.5° ~52° (14% of sky) ~85%
2 hours 30° ~74° (21%) ~79%
6 hours 90° ~134° (37%) ~63%
12 hours 180° ~224° (62%) ~38%
24 hours 360° 360° (100%) 0%

A static bin structure (4 bins of 90° each) could eliminate at most 75% (3 of 4 bins) for a 30-minute query, but eliminates nothing for queries longer than ~6 hours. The query-time filter automatically adapts to the actual window.

The J2 precession rate

RAAN precession is the dominant secular perturbation for LEO orbits. The rate is:

Ω̇ = -1.5 · n · J₂ · (Rₑ/a)² · cos(i)

where:

  • n = √(μ/a³) — mean motion (rad/s), with μ = 398600.8 km³/s² (WGS-72)
  • J₂ = 0.001082616 (WGS-72)
  • Rₑ = 6378.135 km (WGS-72)
  • a — semi-major axis (km)
  • i — inclination (radians)

For a 400 km LEO satellite at 51.6° inclination (ISS-like):

a = 6778.135 km
n = 0.00114 rad/s
Ω̇ = -1.5 × 0.00114 × 0.001082616 × (6378.135/6778.135)² × cos(51.6°)
   ≈ -1.07 × 10⁻⁶ rad/s ≈ -5.3°/day

This rate is fully determined by SMA and inclination — both already indexed at L0 and L1.

Per-candidate cost

Projecting RAAN to query time requires:

  1. One subtraction (epoch difference)
  2. One multiply (rate × time)
  3. One addition (RAAN₀ + drift)
  4. One modulo (wrap to [0, 2π))
  5. One range check (within visibility window?)

Cost: ~10 nanoseconds per candidate.

After L0 and L1 pruning, a typical pass prediction query over 30,000 TLEs leaves ~4,500 candidates. The total RAAN filter cost:

4,500 × 10 ns = 45 μs

This is negligible compared to SGP4 propagation cost for the survivors (typically hundreds of milliseconds to seconds). The filter eliminates ~79% of those 4,500 candidates (for a 2-hour window), leaving ~945 for SGP4 propagation instead of 4,500.


7. Propagation-Aware Cost Estimation

The core innovation

Standard PostgreSQL index cost estimators model I/O cost: pages read, tuples fetched. KTrie's original insight — preserved in this SP-GiST design — is to model downstream computation cost. The expensive operation in satellite queries isn't reading data; it's SGP4/SDP4 propagation.

The cost estimator tells the query planner:

estimated_cost = surviving_population × time_steps × sgp4_cost_per_step

This lets the planner compare "index scan + 945 SGP4 evaluations" vs "sequential scan + 30,000 SGP4 evaluations" — a decision no other PostgreSQL index type can make.

Population tracking via traversalValue

SP-GiST's traversalValue mechanism carries the OrbitalTraversal struct down the tree during scans. At each inner node, the cost estimator accumulates:

typedef struct OrbitalTraversal {
    double  sma_low, sma_high;    /* from L0 */
    double  inc_low, inc_high;    /* from L1 */
    int32   population;           /* objects in this subtree */
    double  j2_rate;              /* derived from L0 + L1 midpoints */
} OrbitalTraversal;

The population field counts objects in each subtree. After L1, j2_rate is computed from the accumulated SMA and inclination bounds. This enables the RAAN filter to predict how many candidates will survive:

expected_visible = population × (RAAN_window / 360°)

Constellation detection as emergent property

Mega-constellations cluster tightly in SMA × inclination space. Starlink shell 1: ~2,000 satellites at 550 km / 53.0°. All members share nearly the same J2 precession rate, and their RAANs are evenly distributed by constellation design (phased orbital planes).

The trie naturally groups these into a single L1 subtree. The cost estimator sees a tight population with a uniform RAAN distribution and can predict:

Starlink shell 1 example (2-hour query from Eagle, Idaho):
  population     = 2,000
  RAAN_window    = 74° → fraction = 20.6%
  RAAN_survivors = 2,000 × 0.206 = ~412
  time_steps     = 120 (2 hours at 60-second intervals)
  sgp4_per_step  = 0.05 ms
  estimated_cost = 412 × 120 × 0.05 ms = 2.5 seconds

No explicit constellation classification is needed. The structure emerges from orbital mechanics.

Custom statistics function

The cost estimator can be registered as a custom statistics function for the SP-GiST operator class, or integrated into the inner_consistent function via traversalValue. The planner sees accurate per-subtree cost predictions without any special configuration:

void spgist_tle_cost_estimate(PlannerInfo *root, IndexPath *path,
                               double loop_count, Cost *startup_cost,
                               Cost *total_cost, Selectivity *selectivity,
                               double *correlation, double *index_pages)
{
    double surviving_pop = estimate_from_traversal(root, path);
    double time_steps = extract_query_window_steps(path);
    double sgp4_cost = 0.05;  /* ms per propagation step, calibratable */

    double raan_fraction = estimate_raan_survival(path);
    double propagation_candidates = surviving_pop * raan_fraction;

    *total_cost = (2 * PAGE_READ_COST)                           /* L0 + L1 traversal */
                + (surviving_pop * RAAN_FILTER_COST)             /* query-time RAAN */
                + (propagation_candidates * time_steps * sgp4_cost)  /* SGP4 */
                + (propagation_candidates * HEAP_FETCH_COST);    /* fetch full TLE */

    *selectivity = propagation_candidates / total_catalog_size;
}

8. Operators

Types

-- Observation query parameters bundled as a composite type
CREATE TYPE observer_window AS (
    obs         observer,         -- existing pg_orrery observer type (lat, lon, alt_m)
    t_start     timestamptz,      -- query window start
    t_end       timestamptz,      -- query window end
    min_el      float8            -- minimum elevation angle in degrees
);

Operator definitions

Three operators, starting minimal. The flagship operator &? is the primary interface for pass prediction.

-- Strategy 1: Altitude regime containment
-- "Is this TLE's orbit within this altitude range?"
CREATE OPERATOR @> (
    LEFTARG   = float8range,        -- altitude range in km (e.g., '[200,600]')
    RIGHTARG  = tle,
    PROCEDURE = tle_regime_contains,
    COMMUTATOR = <@
);

-- Strategy 2: Orbital envelope overlap (enhanced existing &&)
-- "Do these two TLEs share overlapping altitude + inclination space?"
CREATE OPERATOR && (
    LEFTARG   = tle,
    RIGHTARG  = tle,
    PROCEDURE = tle_envelope_overlaps,
    COMMUTATOR = &&
);

-- Strategy 3: Visibility cone check (flagship operator)
-- "Could this satellite be visible from this observer during this time window?"
-- Combines SMA pruning (L0) + inclination pruning (L1) + RAAN query-time filter
CREATE OPERATOR &? (
    LEFTARG   = observer_window,
    RIGHTARG  = tle,
    PROCEDURE = tle_visibility_possible
);

Example queries

-- Primary use case: pass prediction
-- "Which satellites might be visible from Eagle, Idaho in the next 2 hours?"
SELECT s.norad_id, s.name
FROM satellites s
WHERE ROW(
    observer('43.7N 116.4W 760m'),
    now(),
    now() + interval '2 hours',
    10.0
)::observer_window &? s.tle;

-- Altitude regime query
-- "Which satellites orbit between 400 and 600 km?"
SELECT s.norad_id, s.name
FROM satellites s
WHERE '[400,600]'::float8range @> s.tle;

-- Combined: visible LEO passes with full SGP4 propagation
SELECT s.norad_id, s.name,
       predict_passes(s.tle, observer('43.7N 116.4W 760m'),
                      now(), now() + interval '2 hours', 10.0)
FROM satellites s
WHERE ROW(
    observer('43.7N 116.4W 760m'),
    now(),
    now() + interval '2 hours',
    10.0
)::observer_window &? s.tle;

The &? operator returns true for satellites that might be visible (a superset of the actual answer). The predict_passes() function then runs SGP4 propagation on only those candidates. The index's job is to minimize the candidate set, not to produce the final answer.


9. Data Structures

SP-GiST node structure

SP-GiST manages its own page layout, inner tuples, and leaf tuples. We declare our types; the framework handles storage:

  • Prefix type: float8range — the SMA or inclination range covered by an inner node
  • Label type: float8 — bin boundary values for child dispatch
  • Leaf type: tle — the existing pg_orrery TLE type (112 bytes, STORAGE = plain)

No new page format. No custom WAL records. SP-GiST provides all of this.

observer_window composite type

/* Not a new C struct — uses SQL composite type mechanics */
/* Fields: observer (24B) + t_start (8B) + t_end (8B) + min_el (8B) = 48 bytes */

This is a standard SQL composite type, not a custom C type. PostgreSQL handles I/O, storage, and parameter passing. The operator functions extract fields via GetAttributeByNum().

WGS-72 constants

All internal computations use the same WGS-72 constants already defined in pg_orrery's types.h:

#define WGS72_MU      398600.8            /* km³/s²  */
#define WGS72_AE      6378.135            /* km      */
#define WGS72_J2      0.001082616

The constant chain of custody is maintained: WGS-72 for orbital mechanics, WGS-84 for observer coordinate output. No new constants introduced.


10. What Changed from the Original KTrie

Original KTrie Evolved SP-GiST Rationale
Custom index AM (12+ functions) SP-GiST (5 functions) PostgreSQL handles WAL, VACUUM, parallel, recovery
5-level trie (a, i, Ω, e, ω) 2-level trie (a, i) Only SMA + inclination are temporally stable
RAAN as L2 (4-8 static bins) Query-time analytical filter Time-dependent; adapts to window; no reindex
Eccentricity as L3 Query-time check in leaf_consistent Near-zero in LEO; unreliable for fine pruning
Arg. perigee as L4 Dropped Near-zero discrimination in LEO
Patricia compression (custom) SP-GiST prefix compression (built-in) Same semantics, zero custom page management
uint16 population (max 65,535) int32 via traversalValue No overflow risk; Starlink alone targets 12,000+
6 operators 3 operators Start minimal; &? is the flagship
10+ C source files ~3 C source files SP-GiST handles infrastructure
Custom page layout (72B header) SP-GiST page layout No custom page management
Custom split/merge logic SP-GiST picksplit Framework handles page operations
Manual compression triggers SP-GiST automatic prefix handling Framework compresses single-child paths
reindex_raan_drift parameter Eliminated RAAN not in the trie; no reindex needed

What survived intact

Three ideas from the original spec are preserved without modification:

  1. Population-aware cost estimation — the core innovation. Now delivered via traversalValue instead of uint16 per-entry fields.
  2. Equal-population splits — density-balanced tree, not geometry-balanced. Now in picksplit instead of custom split logic.
  3. WGS-72 constant chain of custody — unchanged, inherited from pg_orrery.

11. Implementation Roadmap

Phase 1: SP-GiST prototype

Create the core SP-GiST index with the 5 support functions and 3 operators.

Files:

src/spgist_tle.c       — config, choose, picksplit, inner_consistent, leaf_consistent
src/visibility_ops.c   — &? operator, RAAN filter, observer_window type support

SQL:

sql/pg_orrery--0.6.0--0.7.0.sql   — types, operators, SP-GiST operator class

Tests:

test/sql/spgist_tle.sql      — index creation, operator tests, pruning validation
test/expected/spgist_tle.out

Goal: Working index that correctly prunes by SMA + inclination + RAAN. Correctness over performance.

Phase 2: Cost estimator

Add the propagation-aware cost estimation function.

Files:

src/spgist_cost.c   — custom cost estimator with population tracking

Goal: Query planner correctly chooses index scan vs sequential scan based on predicted SGP4 cost.

Phase 3: Benchmark vs current GiST

Load 30,000 TLEs from Space-Track. Compare:

Metric Current GiST SP-GiST orbital trie
Pruning rate (2h window, 43.7° lat) Measure Measure
Pruning rate (24h window, equator) Measure Measure
Query time (pass prediction) Measure Measure
False positive rate Measure Measure
Index size Measure Measure
Build time Measure Measure

Phase 4: Evaluate

If the SP-GiST trie achieves >85% pruning on the benchmark suite, ship as pg_orrery v0.7.0. If not, analyze where candidates survive and determine whether deeper trie levels or alternative strategies are justified.

Decision gate: The GiST enhancement (adding eccentricity to the existing key) provides the baseline. The SP-GiST trie must demonstrably exceed it to justify the additional code.


12. Open Questions

  1. Should observer_window be a new custom type or a SQL composite? A custom C type (like observer) gives tighter control and avoids composite type overhead. A SQL composite is simpler to implement and extend. The composite is sufficient for the prototype; migrate to custom C type if profiling shows overhead.

  2. Can we store population metadata in SP-GiST's prefix datum? The prefix is float8range (16 bytes). Population could be packed into a custom prefix struct instead, but this couples the prefix to the cost estimator. Using traversalValue keeps them separate. Needs testing to determine if traversalValue introduces measurable overhead.

  3. What's the right default picksplit strategy? Options: median-of-element-value (equal-range) vs median-of-population (equal-count). Equal-count is theoretically better for balanced query cost, but equal-range is simpler and may perform similarly when the catalog distribution is stable. Implement equal-count, benchmark against equal-range.

  4. Should the cost estimator calibrate sgp4_cost_per_step at index creation time? Running a small SGP4 benchmark during CREATE INDEX would give an accurate per-machine calibration. But it couples index creation to runtime performance, and the cost varies with TLE characteristics (near-earth vs deep-space). A configurable GUC (pg_orrery.sgp4_cost_us, default 50) may be more practical.

  5. What about conjunction screening (N×N)? The original spec included &= (conjunction possible). This is a different query pattern: orbit-to-orbit, not observer-to-orbit. The SP-GiST trie can support it (L0+L1 pruning applies), but the RAAN filter doesn't (no observer). Defer to a future phase.

  6. Index-only scans: what data is needed? canReturnData = true enables index-only scans, avoiding heap fetches. The leaf stores the full tle type. If the query only needs NORAD ID and basic orbital parameters (common for pass prediction pre-screening), the index can serve the query without touching the heap. Verify this works correctly with the STORAGE = plain TLE type.