Adds a second non-HE public secondary that pulls AXFR from dell01 (the
hidden primary at 154.27.180.210) and answers public queries on
ns.supported.systems (64.177.113.227, 2001:19f0:5c00:4daa:5400:6ff:fe2d:38fa).
secondary/
Corefile generated, 84 zones + REFUSED catch-all
docker-compose.yml CoreDNS in host-net mode
Makefile up/down/logs/regen/test/axfr-test
.env / .env.example image pin + bind IPs
scripts/generate-secondary-corefile.sh reads ../zones/*.zone
scripts/notify-he.py → notify-secondaries.py
adds 64.177.113.227 as a second
NOTIFY target alongside HE's
216.218.130.2
Uses CoreDNS's `bind` plugin to avoid colliding with systemd-resolved
on loopback :53. Authoritative-only — non-listed zones get REFUSED, no
recursion. AXFR pull requires opening TCP/53 on dell01's FortiWiFi for
the secondary's IP (manual step, separate from this commit).
278 lines
10 KiB
Markdown
278 lines
10 KiB
Markdown
# coredns — hidden-primary DNS for ~91 zones
|
|
|
|
CoreDNS running on **dell01.mer.idahomuellers.net** (LAN `172.16.1.15`,
|
|
public `154.27.180.210`) acts as a hidden primary. Hurricane Electric's
|
|
free secondary service (`dns.he.net`) pulls each zone via AXFR and is
|
|
what the public actually sees. Git/this repo is the source of truth.
|
|
|
|
## Architecture at a glance
|
|
|
|
```
|
|
edit zones/*.zone → make prep → CoreDNS auto-reloads (30s)
|
|
↓
|
|
scripts/notify-secondaries.py
|
|
↓
|
|
NOTIFY → ns1.he.net (216.218.130.2)
|
|
↓
|
|
HE slave-puller (216.218.133.2) does AXFR
|
|
↓
|
|
HE anycast cluster replicates internally
|
|
↓
|
|
public sees new data
|
|
```
|
|
|
|
End-to-end propagation: typically **under 10 minutes** after `make prep`.
|
|
Worst case ~1 hour (HE's poll-only fallback if NOTIFY is missed).
|
|
|
|
## Source of truth
|
|
|
|
- **`zones/*.zone`** — 91 raw Vultr-style zone files. **Edit here.**
|
|
- **`zones-prepared/*.zone`** — generated by `scripts/prepare-zones.sh`:
|
|
injects SOA, replaces NS with `ns1-5.he.net`, dot-terminates rdata,
|
|
bumps serial. **Never edit directly.** Gitignored.
|
|
- **`Corefile`** — CoreDNS config with `(common)` snippet imported by
|
|
plain DNS (`. {}`), DoT (`tls://.:853`), and DoH (`https://.:443`)
|
|
server blocks.
|
|
|
|
## Daily workflow — adding/changing a record
|
|
|
|
```bash
|
|
# 1. Edit the source zone
|
|
$EDITOR zones/homestar.ink.zone
|
|
|
|
# 2. Push, prep (auto-bumps serial), NOTIFY HE
|
|
rsync -avz -e "ssh -A" zones/homestar.ink.zone \
|
|
rpm@dell01.mer.idahomuellers.net:~/coredns/zones/homestar.ink.zone
|
|
ssh -A rpm@dell01.mer.idahomuellers.net 'cd ~/coredns && make prep'
|
|
|
|
# 3. Commit locally
|
|
git add -A && git commit -m "homestar.ink: add foo A 1.2.3.4"
|
|
|
|
# 4. Verify
|
|
./scripts/check-he.sh foo.homestar.ink A
|
|
```
|
|
|
|
Wait ≤5 minutes for HE to AXFR. If serial doesn't flip on HE,
|
|
re-run NOTIFY: `ssh -A dell01... 'cd ~/coredns && ./scripts/notify-secondaries.py'`
|
|
|
|
## Publishing to dell01
|
|
|
|
The repo lives in two places:
|
|
- **Local** (`~/claude/coredns`): where you edit
|
|
- **Remote** (`rpm@dell01.mer.idahomuellers.net:~/coredns`): where
|
|
CoreDNS reads zone files via Docker bind-mount
|
|
|
|
To push the whole project:
|
|
|
|
```bash
|
|
rsync -avz --delete \
|
|
--exclude '.git/' --exclude 'caddy-data/' --exclude 'caddy-config/' \
|
|
--exclude 'certs/*.pem' --exclude 'zones-prepared/*.zone' \
|
|
--exclude '.env.local' \
|
|
-e "ssh -A" \
|
|
./ rpm@dell01.mer.idahomuellers.net:~/coredns/
|
|
```
|
|
|
|
Per-file push for single-zone changes is also fine:
|
|
```bash
|
|
rsync -avz -e "ssh -A" zones/<zone>.zone \
|
|
rpm@dell01.mer.idahomuellers.net:~/coredns/zones/<zone>.zone
|
|
```
|
|
|
|
`-A` forwards your ssh agent so `gh` and other remote git ops work
|
|
inside the dell01 session.
|
|
|
|
## On dell01
|
|
|
|
```bash
|
|
ssh -A rpm@dell01.mer.idahomuellers.net
|
|
cd ~/coredns
|
|
make prep # re-prep zones (auto-bumps SOA + sends NOTIFY)
|
|
make logs # tail CoreDNS logs
|
|
make ps # container status
|
|
```
|
|
|
|
The Docker stack: `coredns` (server) + `coredns-caddy` (LE cert for
|
|
`dns.supported.systems`, used for DoT/DoH).
|
|
|
|
## NOTIFY: external script, not CoreDNS-native
|
|
|
|
We use `scripts/notify-secondaries.py` to send NOTIFY messages to
|
|
`216.218.130.2` (ns1.he.net) on every `make prep`. Pure stdlib Python,
|
|
no deps.
|
|
|
|
**Why a script instead of CoreDNS's built-in `transfer { to <IP> }`?**
|
|
|
|
CoreDNS 1.11.3 and 1.12.2 both have a bug where `transfer { to <IP> }`
|
|
with **any specific IP** (single, multi-line, or space-separated) makes
|
|
the server blocks silently fail to start their listeners — zones load,
|
|
plugin loads, then `.:53` / `tls://.:853` / `https://.:443` never bind.
|
|
Only `transfer { to * }` works.
|
|
|
|
So:
|
|
- `Corefile`: `transfer { to * }` — open AXFR (firewall does the
|
|
source-IP filtering on TCP/53 NAT anyway)
|
|
- `notify-secondaries.py`: sends NOTIFY explicitly to each secondary's IP
|
|
|
|
NOTIFY happens automatically on `make prep`. To NOTIFY manually:
|
|
```bash
|
|
ssh -A dell01... 'cd ~/coredns && ./scripts/notify-secondaries.py'
|
|
```
|
|
|
|
The script's output doubles as a **"what's on HE" inventory** — `✓`
|
|
for zones HE hosts, `✗ rcode=5` for zones HE doesn't yet host.
|
|
|
|
**HE's NOTIFY behavior**: HE acks NOTIFY at the protocol level (rcode=0),
|
|
and *usually* triggers an immediate AXFR. Sometimes the batch NOTIFY
|
|
fired from `make prep` doesn't seem to wake them; re-running
|
|
`notify-secondaries.py` manually almost always does. Per-zone NOTIFY is more
|
|
reliable than batch.
|
|
|
|
## HE asymmetric IPs
|
|
|
|
Hurricane Electric requires:
|
|
- **AXFR pull source**: `216.218.133.2` (`slave.dns.he.net` /
|
|
`ns4.he.net`) — does NOT serve public queries, only does AXFR pulls
|
|
- **NOTIFY destination**: `216.218.130.2` (`ns1.he.net`)
|
|
- **Public-facing anycast**: `ns1`, `ns2`, `ns3`, `ns5` (`.130.2`,
|
|
`.131.2`, `.132.2`, `216.66.1.2`)
|
|
|
|
`scripts/check-he.sh <name> [type]` queries all 4 public anycast IPs
|
|
in parallel and flags divergence.
|
|
|
|
## HE two-stage propagation
|
|
|
|
When you bump a serial, HE goes through:
|
|
1. **slave-puller pulls AXFR** — happens quickly after NOTIFY (~seconds)
|
|
2. **internal anycast replication** — propagates to public-facing PoPs
|
|
on HE's clock (1-15 min usually, can be longer)
|
|
|
|
`check-he.sh` shows when stage 2 has completed (all 4 anycast NS report
|
|
the same serial + answer).
|
|
|
|
## SOA timers
|
|
|
|
`scripts/prepare-zones.sh` writes these for every zone:
|
|
```
|
|
serial YYYYMMDDNN (auto-incrementing per-day counter)
|
|
refresh 300 (5 min — HE polls SOA this often)
|
|
retry 120 (2 min — HE retries failed polls)
|
|
expire 604800 (1 week)
|
|
minimum 60 (1 min — NXDOMAIN negative-cache TTL)
|
|
```
|
|
|
|
These are aggressive but appropriate for the hidden-primary pattern.
|
|
The 60s minimum keeps stale NXDOMAIN cache windows short after adding
|
|
a new name.
|
|
|
|
## Empty-non-terminal trap (RFC 4592)
|
|
|
|
If a name X has children in the zone (especially stale
|
|
`_acme-challenge.<sub>.X` TXT records), X becomes an "empty
|
|
non-terminal." HE strictly follows RFC 4592 §2.2.3: wildcards do NOT
|
|
synthesize for empty non-terminals. So `*.<parent>` skips X even though
|
|
the wildcard would otherwise have caught it.
|
|
|
|
**Symptom**: dell01 returns the wildcard answer (CoreDNS is lenient),
|
|
HE returns NODATA. Public clients see "broken" for X.
|
|
|
|
**Fix**: add an explicit record at X (`X A 1.2.3.4` or `X CNAME @`).
|
|
|
|
To find empty-non-terminals across zones:
|
|
```bash
|
|
# For each zone with a wildcard, find _acme-challenge.<X> entries
|
|
# where <X> has no explicit record at that exact name.
|
|
# See git log for 5afdb05 / f6111c2 / f8363e5 for the audit pattern.
|
|
```
|
|
|
|
## Wildcard depth
|
|
|
|
HE follows RFC 4592 fully: `*.<parent>` matches **any depth** of names
|
|
under `<parent>` as long as no intermediate names exist in the zone
|
|
tree. So `*.demo` catches `something.demo` AND `deep.path.demo` (the
|
|
latter only if `path.demo` doesn't exist as a node).
|
|
|
|
Intermediate empty non-terminals **do** block synthesis below them.
|
|
|
|
## Zone-by-zone HE status
|
|
|
|
`./scripts/notify-secondaries.py` prints `✓` / `✗` per zone — `✓` means HE
|
|
hosts that zone as a secondary, `✗` (rcode=5) means HE doesn't yet
|
|
host it. As of the last NOTIFY run, ~11 of 91 zones are slaved on HE.
|
|
The other 80 are still served from Vultr at the registrar level.
|
|
|
|
To migrate a zone fully to HE:
|
|
1. Add as Secondary DNS at `dns.he.net` with master IP `154.27.180.210`
|
|
2. Update registrar NS records: replace `ns1/ns2.vultr.com` with
|
|
`ns1-ns5.he.net` (some registrars limit to 4 NS — drop ns5 if so)
|
|
3. Wait for TLD propagation (minutes for gTLDs, hours for `.us` etc.)
|
|
4. Optionally clean up Vultr-side zone records
|
|
|
|
`scripts/check-he.sh` will then show this zone live across HE's anycast.
|
|
|
|
## TLS for DoT/DoH
|
|
|
|
DoT (`:8853` external, `:853` internal) and DoH (`:8443` external,
|
|
`:443` internal) are terminated by CoreDNS using a Let's Encrypt cert
|
|
for `dns.supported.systems`. The cert is provisioned and auto-renewed
|
|
by `coredns-caddy` sidecar, which uses DNS-01 challenge via Vultr API
|
|
(needs `VULTR_API_KEY` in shell env at startup).
|
|
|
|
Renewal happens automatically; Caddy uses ACME ARI to schedule it.
|
|
|
|
## Key files
|
|
|
|
| Path | Purpose |
|
|
|---|---|
|
|
| `zones/*.zone` | Source-of-truth zone files (edit here) |
|
|
| `zones-prepared/*.zone` | Generated, served by CoreDNS (gitignored) |
|
|
| `Corefile` | CoreDNS config |
|
|
| `scripts/prepare-zones.sh` | Zone prep + auto-bump serial |
|
|
| `scripts/notify-secondaries.py` | Send NOTIFY to ns1.he.net + ns.supported.systems |
|
|
| `secondary/` | Public secondary (CoreDNS in Docker) deployed to ns.supported.systems |
|
|
| `scripts/check-he.sh` | Parallel HE anycast verification |
|
|
| `caddy/Caddyfile` + `caddy/Dockerfile` | Caddy sidecar config |
|
|
| `docker-compose.yml` | CoreDNS + Caddy stack |
|
|
| `Makefile` | `make prep`, `make up`, `make down`, `make logs`, etc. |
|
|
| `.env` | Image pins, ports |
|
|
|
|
## Known operational quirks
|
|
|
|
- **`make prep` errors first run of new day**: fixed in
|
|
`prepare-zones.sh` (grep with `|| true` for the serial-detection
|
|
step). Don't revert that.
|
|
- **Full `docker compose down + up` needed after Corefile changes that
|
|
touch `transfer`**: `restart` alone leaves sticky state that prevents
|
|
listener binding.
|
|
- **Vultr DNS still authoritative for ~80 zones** (registrar NS hasn't
|
|
been migrated to HE). The hidden-primary stack still serves them
|
|
locally and on dell01, but public DNS uses Vultr until you migrate.
|
|
|
|
## Useful one-liners
|
|
|
|
```bash
|
|
# Find records pointing at a specific IP
|
|
grep -rE '\b1\.2\.3\.4\b' zones/
|
|
|
|
# Find all _acme-challenge records (potential empty-non-terminal sources)
|
|
grep -E "_acme-challenge\." zones/<zone>.zone
|
|
|
|
# Compare dell01 vs HE for a specific zone
|
|
ZONE=homestar.ink
|
|
echo "dell01: $(dig @dell01.mer.idahomuellers.net -p 5353 $ZONE SOA +short | awk '{print $3}')"
|
|
echo "HE: $(dig @ns1.he.net $ZONE SOA +short | awk '{print $3}')"
|
|
|
|
# What's the current SOA serial across all HE anycast for a zone?
|
|
./scripts/check-he.sh <zone> SOA
|
|
```
|
|
|
|
## Don't do
|
|
|
|
- **Don't edit `zones-prepared/`** — it's regenerated by `make prep`
|
|
- **Don't put `transfer { to <IP> }`** in Corefile — CoreDNS bug,
|
|
silently breaks listener startup. Stick to `transfer { to * }`.
|
|
- **Don't commit `.env.local`, `caddy-data/`, `certs/*.pem`** — these
|
|
are gitignored for a reason
|
|
- **Don't manually bump serials in zones-prepared** — `make prep`
|
|
handles it correctly via `prepare-zones.sh`'s auto-bumper
|