Adds a second non-HE public secondary that pulls AXFR from dell01 (the
hidden primary at 154.27.180.210) and answers public queries on
ns.supported.systems (64.177.113.227, 2001:19f0:5c00:4daa:5400:6ff:fe2d:38fa).
secondary/
Corefile generated, 84 zones + REFUSED catch-all
docker-compose.yml CoreDNS in host-net mode
Makefile up/down/logs/regen/test/axfr-test
.env / .env.example image pin + bind IPs
scripts/generate-secondary-corefile.sh reads ../zones/*.zone
scripts/notify-he.py → notify-secondaries.py
adds 64.177.113.227 as a second
NOTIFY target alongside HE's
216.218.130.2
Uses CoreDNS's `bind` plugin to avoid colliding with systemd-resolved
on loopback :53. Authoritative-only — non-listed zones get REFUSED, no
recursion. AXFR pull requires opening TCP/53 on dell01's FortiWiFi for
the secondary's IP (manual step, separate from this commit).
10 KiB
coredns — hidden-primary DNS for ~91 zones
CoreDNS running on dell01.mer.idahomuellers.net (LAN 172.16.1.15,
public 154.27.180.210) acts as a hidden primary. Hurricane Electric's
free secondary service (dns.he.net) pulls each zone via AXFR and is
what the public actually sees. Git/this repo is the source of truth.
Architecture at a glance
edit zones/*.zone → make prep → CoreDNS auto-reloads (30s)
↓
scripts/notify-secondaries.py
↓
NOTIFY → ns1.he.net (216.218.130.2)
↓
HE slave-puller (216.218.133.2) does AXFR
↓
HE anycast cluster replicates internally
↓
public sees new data
End-to-end propagation: typically under 10 minutes after make prep.
Worst case ~1 hour (HE's poll-only fallback if NOTIFY is missed).
Source of truth
zones/*.zone— 91 raw Vultr-style zone files. Edit here.zones-prepared/*.zone— generated byscripts/prepare-zones.sh: injects SOA, replaces NS withns1-5.he.net, dot-terminates rdata, bumps serial. Never edit directly. Gitignored.Corefile— CoreDNS config with(common)snippet imported by plain DNS (. {}), DoT (tls://.:853), and DoH (https://.:443) server blocks.
Daily workflow — adding/changing a record
# 1. Edit the source zone
$EDITOR zones/homestar.ink.zone
# 2. Push, prep (auto-bumps serial), NOTIFY HE
rsync -avz -e "ssh -A" zones/homestar.ink.zone \
rpm@dell01.mer.idahomuellers.net:~/coredns/zones/homestar.ink.zone
ssh -A rpm@dell01.mer.idahomuellers.net 'cd ~/coredns && make prep'
# 3. Commit locally
git add -A && git commit -m "homestar.ink: add foo A 1.2.3.4"
# 4. Verify
./scripts/check-he.sh foo.homestar.ink A
Wait ≤5 minutes for HE to AXFR. If serial doesn't flip on HE,
re-run NOTIFY: ssh -A dell01... 'cd ~/coredns && ./scripts/notify-secondaries.py'
Publishing to dell01
The repo lives in two places:
- Local (
~/claude/coredns): where you edit - Remote (
rpm@dell01.mer.idahomuellers.net:~/coredns): where CoreDNS reads zone files via Docker bind-mount
To push the whole project:
rsync -avz --delete \
--exclude '.git/' --exclude 'caddy-data/' --exclude 'caddy-config/' \
--exclude 'certs/*.pem' --exclude 'zones-prepared/*.zone' \
--exclude '.env.local' \
-e "ssh -A" \
./ rpm@dell01.mer.idahomuellers.net:~/coredns/
Per-file push for single-zone changes is also fine:
rsync -avz -e "ssh -A" zones/<zone>.zone \
rpm@dell01.mer.idahomuellers.net:~/coredns/zones/<zone>.zone
-A forwards your ssh agent so gh and other remote git ops work
inside the dell01 session.
On dell01
ssh -A rpm@dell01.mer.idahomuellers.net
cd ~/coredns
make prep # re-prep zones (auto-bumps SOA + sends NOTIFY)
make logs # tail CoreDNS logs
make ps # container status
The Docker stack: coredns (server) + coredns-caddy (LE cert for
dns.supported.systems, used for DoT/DoH).
NOTIFY: external script, not CoreDNS-native
We use scripts/notify-secondaries.py to send NOTIFY messages to
216.218.130.2 (ns1.he.net) on every make prep. Pure stdlib Python,
no deps.
Why a script instead of CoreDNS's built-in transfer { to <IP> }?
CoreDNS 1.11.3 and 1.12.2 both have a bug where transfer { to <IP> }
with any specific IP (single, multi-line, or space-separated) makes
the server blocks silently fail to start their listeners — zones load,
plugin loads, then .:53 / tls://.:853 / https://.:443 never bind.
Only transfer { to * } works.
So:
Corefile:transfer { to * }— open AXFR (firewall does the source-IP filtering on TCP/53 NAT anyway)notify-secondaries.py: sends NOTIFY explicitly to each secondary's IP
NOTIFY happens automatically on make prep. To NOTIFY manually:
ssh -A dell01... 'cd ~/coredns && ./scripts/notify-secondaries.py'
The script's output doubles as a "what's on HE" inventory — ✓
for zones HE hosts, ✗ rcode=5 for zones HE doesn't yet host.
HE's NOTIFY behavior: HE acks NOTIFY at the protocol level (rcode=0),
and usually triggers an immediate AXFR. Sometimes the batch NOTIFY
fired from make prep doesn't seem to wake them; re-running
notify-secondaries.py manually almost always does. Per-zone NOTIFY is more
reliable than batch.
HE asymmetric IPs
Hurricane Electric requires:
- AXFR pull source:
216.218.133.2(slave.dns.he.net/ns4.he.net) — does NOT serve public queries, only does AXFR pulls - NOTIFY destination:
216.218.130.2(ns1.he.net) - Public-facing anycast:
ns1,ns2,ns3,ns5(.130.2,.131.2,.132.2,216.66.1.2)
scripts/check-he.sh <name> [type] queries all 4 public anycast IPs
in parallel and flags divergence.
HE two-stage propagation
When you bump a serial, HE goes through:
- slave-puller pulls AXFR — happens quickly after NOTIFY (~seconds)
- internal anycast replication — propagates to public-facing PoPs on HE's clock (1-15 min usually, can be longer)
check-he.sh shows when stage 2 has completed (all 4 anycast NS report
the same serial + answer).
SOA timers
scripts/prepare-zones.sh writes these for every zone:
serial YYYYMMDDNN (auto-incrementing per-day counter)
refresh 300 (5 min — HE polls SOA this often)
retry 120 (2 min — HE retries failed polls)
expire 604800 (1 week)
minimum 60 (1 min — NXDOMAIN negative-cache TTL)
These are aggressive but appropriate for the hidden-primary pattern. The 60s minimum keeps stale NXDOMAIN cache windows short after adding a new name.
Empty-non-terminal trap (RFC 4592)
If a name X has children in the zone (especially stale
_acme-challenge.<sub>.X TXT records), X becomes an "empty
non-terminal." HE strictly follows RFC 4592 §2.2.3: wildcards do NOT
synthesize for empty non-terminals. So *.<parent> skips X even though
the wildcard would otherwise have caught it.
Symptom: dell01 returns the wildcard answer (CoreDNS is lenient), HE returns NODATA. Public clients see "broken" for X.
Fix: add an explicit record at X (X A 1.2.3.4 or X CNAME @).
To find empty-non-terminals across zones:
# For each zone with a wildcard, find _acme-challenge.<X> entries
# where <X> has no explicit record at that exact name.
# See git log for 5afdb05 / f6111c2 / f8363e5 for the audit pattern.
Wildcard depth
HE follows RFC 4592 fully: *.<parent> matches any depth of names
under <parent> as long as no intermediate names exist in the zone
tree. So *.demo catches something.demo AND deep.path.demo (the
latter only if path.demo doesn't exist as a node).
Intermediate empty non-terminals do block synthesis below them.
Zone-by-zone HE status
./scripts/notify-secondaries.py prints ✓ / ✗ per zone — ✓ means HE
hosts that zone as a secondary, ✗ (rcode=5) means HE doesn't yet
host it. As of the last NOTIFY run, ~11 of 91 zones are slaved on HE.
The other 80 are still served from Vultr at the registrar level.
To migrate a zone fully to HE:
- Add as Secondary DNS at
dns.he.netwith master IP154.27.180.210 - Update registrar NS records: replace
ns1/ns2.vultr.comwithns1-ns5.he.net(some registrars limit to 4 NS — drop ns5 if so) - Wait for TLD propagation (minutes for gTLDs, hours for
.usetc.) - Optionally clean up Vultr-side zone records
scripts/check-he.sh will then show this zone live across HE's anycast.
TLS for DoT/DoH
DoT (:8853 external, :853 internal) and DoH (:8443 external,
:443 internal) are terminated by CoreDNS using a Let's Encrypt cert
for dns.supported.systems. The cert is provisioned and auto-renewed
by coredns-caddy sidecar, which uses DNS-01 challenge via Vultr API
(needs VULTR_API_KEY in shell env at startup).
Renewal happens automatically; Caddy uses ACME ARI to schedule it.
Key files
| Path | Purpose |
|---|---|
zones/*.zone |
Source-of-truth zone files (edit here) |
zones-prepared/*.zone |
Generated, served by CoreDNS (gitignored) |
Corefile |
CoreDNS config |
scripts/prepare-zones.sh |
Zone prep + auto-bump serial |
scripts/notify-secondaries.py |
Send NOTIFY to ns1.he.net + ns.supported.systems |
secondary/ |
Public secondary (CoreDNS in Docker) deployed to ns.supported.systems |
scripts/check-he.sh |
Parallel HE anycast verification |
caddy/Caddyfile + caddy/Dockerfile |
Caddy sidecar config |
docker-compose.yml |
CoreDNS + Caddy stack |
Makefile |
make prep, make up, make down, make logs, etc. |
.env |
Image pins, ports |
Known operational quirks
make preperrors first run of new day: fixed inprepare-zones.sh(grep with|| truefor the serial-detection step). Don't revert that.- Full
docker compose down + upneeded after Corefile changes that touchtransfer:restartalone leaves sticky state that prevents listener binding. - Vultr DNS still authoritative for ~80 zones (registrar NS hasn't been migrated to HE). The hidden-primary stack still serves them locally and on dell01, but public DNS uses Vultr until you migrate.
Useful one-liners
# Find records pointing at a specific IP
grep -rE '\b1\.2\.3\.4\b' zones/
# Find all _acme-challenge records (potential empty-non-terminal sources)
grep -E "_acme-challenge\." zones/<zone>.zone
# Compare dell01 vs HE for a specific zone
ZONE=homestar.ink
echo "dell01: $(dig @dell01.mer.idahomuellers.net -p 5353 $ZONE SOA +short | awk '{print $3}')"
echo "HE: $(dig @ns1.he.net $ZONE SOA +short | awk '{print $3}')"
# What's the current SOA serial across all HE anycast for a zone?
./scripts/check-he.sh <zone> SOA
Don't do
- Don't edit
zones-prepared/— it's regenerated bymake prep - Don't put
transfer { to <IP> }in Corefile — CoreDNS bug, silently breaks listener startup. Stick totransfer { to * }. - Don't commit
.env.local,caddy-data/,certs/*.pem— these are gitignored for a reason - Don't manually bump serials in zones-prepared —
make prephandles it correctly viaprepare-zones.sh's auto-bumper