# coredns — hidden-primary DNS for ~91 zones CoreDNS running on **dell01.mer.idahomuellers.net** (LAN `172.16.1.15`, public `154.27.180.210`) acts as a hidden primary. Hurricane Electric's free secondary service (`dns.he.net`) pulls each zone via AXFR and is what the public actually sees. Git/this repo is the source of truth. ## Architecture at a glance ``` edit zones/*.zone → make prep → CoreDNS auto-reloads (30s) ↓ scripts/notify-secondaries.py ↓ NOTIFY → ns1.he.net (216.218.130.2) ↓ HE slave-puller (216.218.133.2) does AXFR ↓ HE anycast cluster replicates internally ↓ public sees new data ``` End-to-end propagation: typically **under 10 minutes** after `make prep`. Worst case ~1 hour (HE's poll-only fallback if NOTIFY is missed). ## Source of truth - **`zones/*.zone`** — 91 raw Vultr-style zone files. **Edit here.** - **`zones-prepared/*.zone`** — generated by `scripts/prepare-zones.sh`: injects SOA, replaces NS with `ns1-5.he.net`, dot-terminates rdata, bumps serial. **Never edit directly.** Gitignored. - **`Corefile`** — CoreDNS config with `(common)` snippet imported by plain DNS (`. {}`), DoT (`tls://.:853`), and DoH (`https://.:443`) server blocks. ## Daily workflow — adding/changing a record ```bash # 1. Edit the source zone $EDITOR zones/homestar.ink.zone # 2. Push, prep (auto-bumps serial), NOTIFY HE rsync -avz -e "ssh -A" zones/homestar.ink.zone \ rpm@dell01.mer.idahomuellers.net:~/coredns/zones/homestar.ink.zone ssh -A rpm@dell01.mer.idahomuellers.net 'cd ~/coredns && make prep' # 3. Commit locally git add -A && git commit -m "homestar.ink: add foo A 1.2.3.4" # 4. Verify ./scripts/check-he.sh foo.homestar.ink A ``` Wait ≤5 minutes for HE to AXFR. If serial doesn't flip on HE, re-run NOTIFY: `ssh -A dell01... 'cd ~/coredns && ./scripts/notify-secondaries.py'` ## Publishing to dell01 The repo lives in two places: - **Local** (`~/claude/coredns`): where you edit - **Remote** (`rpm@dell01.mer.idahomuellers.net:~/coredns`): where CoreDNS reads zone files via Docker bind-mount To push the whole project: ```bash rsync -avz --delete \ --exclude '.git/' --exclude 'caddy-data/' --exclude 'caddy-config/' \ --exclude 'certs/*.pem' --exclude 'zones-prepared/*.zone' \ --exclude '.env.local' \ -e "ssh -A" \ ./ rpm@dell01.mer.idahomuellers.net:~/coredns/ ``` Per-file push for single-zone changes is also fine: ```bash rsync -avz -e "ssh -A" zones/.zone \ rpm@dell01.mer.idahomuellers.net:~/coredns/zones/.zone ``` `-A` forwards your ssh agent so `gh` and other remote git ops work inside the dell01 session. ## On dell01 ```bash ssh -A rpm@dell01.mer.idahomuellers.net cd ~/coredns make prep # re-prep zones (auto-bumps SOA + sends NOTIFY) make logs # tail CoreDNS logs make ps # container status ``` The Docker stack: `coredns` (server) + `coredns-caddy` (LE cert for `dns.supported.systems`, used for DoT/DoH). ## NOTIFY: external script, not CoreDNS-native We use `scripts/notify-secondaries.py` to send NOTIFY messages to `216.218.130.2` (ns1.he.net) on every `make prep`. Pure stdlib Python, no deps. **Why a script instead of CoreDNS's built-in `transfer { to }`?** CoreDNS 1.11.3 and 1.12.2 both have a bug where `transfer { to }` with **any specific IP** (single, multi-line, or space-separated) makes the server blocks silently fail to start their listeners — zones load, plugin loads, then `.:53` / `tls://.:853` / `https://.:443` never bind. Only `transfer { to * }` works. So: - `Corefile`: `transfer { to * }` — open AXFR (firewall does the source-IP filtering on TCP/53 NAT anyway) - `notify-secondaries.py`: sends NOTIFY explicitly to each secondary's IP NOTIFY happens automatically on `make prep`. To NOTIFY manually: ```bash ssh -A dell01... 'cd ~/coredns && ./scripts/notify-secondaries.py' ``` The script's output doubles as a **"what's on HE" inventory** — `✓` for zones HE hosts, `✗ rcode=5` for zones HE doesn't yet host. **HE's NOTIFY behavior**: HE acks NOTIFY at the protocol level (rcode=0), and *usually* triggers an immediate AXFR. Sometimes the batch NOTIFY fired from `make prep` doesn't seem to wake them; re-running `notify-secondaries.py` manually almost always does. Per-zone NOTIFY is more reliable than batch. ## HE asymmetric IPs Hurricane Electric requires: - **AXFR pull source**: `216.218.133.2` (`slave.dns.he.net` / `ns4.he.net`) — does NOT serve public queries, only does AXFR pulls - **NOTIFY destination**: `216.218.130.2` (`ns1.he.net`) - **Public-facing anycast**: `ns1`, `ns2`, `ns3`, `ns5` (`.130.2`, `.131.2`, `.132.2`, `216.66.1.2`) `scripts/check-he.sh [type]` queries all 4 public anycast IPs in parallel and flags divergence. ## HE two-stage propagation When you bump a serial, HE goes through: 1. **slave-puller pulls AXFR** — happens quickly after NOTIFY (~seconds) 2. **internal anycast replication** — propagates to public-facing PoPs on HE's clock (1-15 min usually, can be longer) `check-he.sh` shows when stage 2 has completed (all 4 anycast NS report the same serial + answer). ## SOA timers `scripts/prepare-zones.sh` writes these for every zone: ``` serial YYYYMMDDNN (auto-incrementing per-day counter) refresh 300 (5 min — HE polls SOA this often) retry 120 (2 min — HE retries failed polls) expire 604800 (1 week) minimum 60 (1 min — NXDOMAIN negative-cache TTL) ``` These are aggressive but appropriate for the hidden-primary pattern. The 60s minimum keeps stale NXDOMAIN cache windows short after adding a new name. ## Empty-non-terminal trap (RFC 4592) If a name X has children in the zone (especially stale `_acme-challenge..X` TXT records), X becomes an "empty non-terminal." HE strictly follows RFC 4592 §2.2.3: wildcards do NOT synthesize for empty non-terminals. So `*.` skips X even though the wildcard would otherwise have caught it. **Symptom**: dell01 returns the wildcard answer (CoreDNS is lenient), HE returns NODATA. Public clients see "broken" for X. **Fix**: add an explicit record at X (`X A 1.2.3.4` or `X CNAME @`). To find empty-non-terminals across zones: ```bash # For each zone with a wildcard, find _acme-challenge. entries # where has no explicit record at that exact name. # See git log for 5afdb05 / f6111c2 / f8363e5 for the audit pattern. ``` ## Wildcard depth HE follows RFC 4592 fully: `*.` matches **any depth** of names under `` as long as no intermediate names exist in the zone tree. So `*.demo` catches `something.demo` AND `deep.path.demo` (the latter only if `path.demo` doesn't exist as a node). Intermediate empty non-terminals **do** block synthesis below them. ## Zone-by-zone HE status `./scripts/notify-secondaries.py` prints `✓` / `✗` per zone — `✓` means HE hosts that zone as a secondary, `✗` (rcode=5) means HE doesn't yet host it. As of the last NOTIFY run, ~11 of 91 zones are slaved on HE. The other 80 are still served from Vultr at the registrar level. To migrate a zone fully to HE: 1. Add as Secondary DNS at `dns.he.net` with master IP `154.27.180.210` 2. Update registrar NS records: replace `ns1/ns2.vultr.com` with `ns1-ns5.he.net` (some registrars limit to 4 NS — drop ns5 if so) 3. Wait for TLD propagation (minutes for gTLDs, hours for `.us` etc.) 4. Optionally clean up Vultr-side zone records `scripts/check-he.sh` will then show this zone live across HE's anycast. ## TLS for DoT/DoH DoT (`:8853` external, `:853` internal) and DoH (`:8443` external, `:443` internal) are terminated by CoreDNS using a Let's Encrypt cert for `dns.supported.systems`. The cert is provisioned and auto-renewed by `coredns-caddy` sidecar, which uses DNS-01 challenge via Vultr API (needs `VULTR_API_KEY` in shell env at startup). Renewal happens automatically; Caddy uses ACME ARI to schedule it. ## Key files | Path | Purpose | |---|---| | `zones/*.zone` | Source-of-truth zone files (edit here) | | `zones-prepared/*.zone` | Generated, served by CoreDNS (gitignored) | | `Corefile` | CoreDNS config | | `scripts/prepare-zones.sh` | Zone prep + auto-bump serial | | `scripts/notify-secondaries.py` | Send NOTIFY to ns1.he.net + ns.supported.systems | | `secondary/` | Public secondary (CoreDNS in Docker) deployed to ns.supported.systems | | `scripts/check-he.sh` | Parallel HE anycast verification | | `caddy/Caddyfile` + `caddy/Dockerfile` | Caddy sidecar config | | `docker-compose.yml` | CoreDNS + Caddy stack | | `Makefile` | `make prep`, `make up`, `make down`, `make logs`, etc. | | `.env` | Image pins, ports | ## Known operational quirks - **`make prep` errors first run of new day**: fixed in `prepare-zones.sh` (grep with `|| true` for the serial-detection step). Don't revert that. - **Full `docker compose down + up` needed after Corefile changes that touch `transfer`**: `restart` alone leaves sticky state that prevents listener binding. - **Vultr DNS still authoritative for ~80 zones** (registrar NS hasn't been migrated to HE). The hidden-primary stack still serves them locally and on dell01, but public DNS uses Vultr until you migrate. ## Useful one-liners ```bash # Find records pointing at a specific IP grep -rE '\b1\.2\.3\.4\b' zones/ # Find all _acme-challenge records (potential empty-non-terminal sources) grep -E "_acme-challenge\." zones/.zone # Compare dell01 vs HE for a specific zone ZONE=homestar.ink echo "dell01: $(dig @dell01.mer.idahomuellers.net -p 5353 $ZONE SOA +short | awk '{print $3}')" echo "HE: $(dig @ns1.he.net $ZONE SOA +short | awk '{print $3}')" # What's the current SOA serial across all HE anycast for a zone? ./scripts/check-he.sh SOA ``` ## Don't do - **Don't edit `zones-prepared/`** — it's regenerated by `make prep` - **Don't put `transfer { to }`** in Corefile — CoreDNS bug, silently breaks listener startup. Stick to `transfer { to * }`. - **Don't commit `.env.local`, `caddy-data/`, `certs/*.pem`** — these are gitignored for a reason - **Don't manually bump serials in zones-prepared** — `make prep` handles it correctly via `prepare-zones.sh`'s auto-bumper