Per RFC 1996, a master that mutates a zone SHOULD notify its
secondaries so they can immediately AXFR rather than wait for their
next SOA-refresh poll. Without this, propagation lag from UPDATE to
public DNS is bounded by the secondary's refresh interval (300s for
us) — which is borderline for ACME validation timing.
New Corefile directive:
notify <host[:port]> [<host[:port]>...]
Targets accept bare hostnames (port 53 default), host:port, or
[ipv6]:port. The same list applies to every zone in the rfc2136
block.
Implementation: fire-and-forget UDP per target, each in its own
goroutine, capped by a 2s timeout. The UPDATE response to the client
is never held pending NOTIFY acks (RFC 1996 §4 explicitly decouples
them). Failures log at DEBUG only — a briefly-unreachable secondary
is normal and would otherwise spam logs.
Retires the external scripts/notify-secondaries.py workflow for any
deployment that wires the directive: secondaries now hear about
changes within seconds of the UPDATE landing, no cron or manual
invocation needed.
New tests:
- TestSendNotify_DeliversToTarget — packet arrives, opcode + zone correct
- TestSendNotify_NoTargets_NoCrash — empty list short-circuits
- TestSendNotify_BadTarget_LogsButDoesNotBlock — fire-and-forget timing
- TestNotifyOne_AppendsDefaultPort — host vs host:port normalization
Hamilton M8: a compromised TSIG key — or a misconfigured client
retrying forever — must not be able to drive unbounded UPDATE traffic.
Each UPDATE costs disk IOPS, a git commit, and a slot in the SOA
serial counter (now 9999/day per zone). Without a cap, a few hours of
runaway traffic could exhaust the SOA serial counter and brick the
zone for the day.
Implementation: per-key token bucket in ratelimit.go. Default 100
tokens / 60 seconds. New keys start full so legitimate clients see no
delay at boot. Refill is continuous, capped at the burst value.
Configurable in Corefile:
rate-limit off # disable entirely
rate-limit <burst> <period-secs> # e.g., rate-limit 200 60
Enforcement runs in ServeDNS after TSIG verification — a request that
fails auth doesn't consume a token (and a forged TSIG can't be used to
deny service to a real key holder, since we never reached the rate
check).
100/min is well above ACME's needs: a worst-case full-renewal storm
across our ~84 zones emits maybe 200 UPDATEs total over several
minutes. Anything beyond is suspicious by definition.
New tests covering: first-call allowed, burst exhaustion, refill
behavior, per-key isolation, refill-cap (no idle-accumulation
overflow).
C1 — Document the process-global MsgAcceptFunc mutation:
CoreDNS 1.14.3 doesn't expose per-Config MsgAcceptFunc (server.go:159
hardcodes the dns.Server struct), so the override has to be global. The
init()-level comment now explains the operational consequences in
detail, and setup() emits a loud INFO log calling out the global scope
for operator audit. Upstream support for per-Config MsgAcceptFunc would
let us delete the whole stanza.
C2 — handleUpdate now requires the caller to assert TSIG verification
via an explicit `verified bool` parameter. The security contract is
encoded in the function signature, not in convention. ServeDNS passes
verified=true after checkTSIG succeeds; verified=false produces an
immediate Refused with no state mutation. Future internal callers
(NOTIFY relay, admin RPC, refactor) physically cannot reach the
mutation code without proving the request was authenticated.
M9 — Don't sign TSIG-failure rejection responses. Per Hamilton's
finding, signing a rejection with the named key attests "yes, this
server holds that key" — useful intel for an attacker probing key
existence. Unsigned Refused is the right shape: nsupdate sees "no TSIG
on reply" and treats as auth failure, which is what actually happened.
New test TestUpdate_UnverifiedCaller_Refused proves the C2 contract:
handleUpdate(w, msg, false) refuses, zone file unchanged.
When a request arrives with TSIG, attach a TSIG record to the response
so dns.ResponseWriter computes the MAC at write time using the secret
in TsigSecret. Without this, BIND nsupdate complains "expected a TSIG
or SIG(0)" on every UPDATE, even when the update applies successfully.
Two response paths fixed:
- handleUpdate success/per-rcode replies (update.go)
- ServeDNS rejection when TSIG verification fails (plugin.go)
The new helper in tsig.go is a no-op for unsigned requests. Unknown
keys still silently skip signing — we can't authenticate to a peer we
don't share a key with.
Tests verify both branches: signed request → response carries matching
TSIG (key name + algorithm); unsigned request → response stays plain.
Major architectural pivot per the user's "RFC 2136 mechanism for the
existing zonefiles, not a new in-memory thing" framing. The plugin no
longer maintains its own in-memory state OR serves any queries -- both
of those are now the auto plugin's job, reading the same zone files.
The plugin's sole responsibility is now: receive TSIG-authed UPDATE
messages, edit the matching zones/<zone>.zone file, bump the SOA
serial in CalVer (YYYYMMDDNN) form, and optionally auto-commit to git.
What changed:
- DELETED: store.go (in-memory recordStore), store_test.go (12 tests),
plugin_test.go (10 ServeDNS query tests), old update_test.go.
- NEW: zonefile.go -- file-backed authority for one zone. loadRRs via
miekg/dns zone parser; mutation helpers (lookupIn/nameExistsIn/
removeRRsetFrom/removeRRFrom/removeNameFrom/addRRTo) on []dns.RR
slices; bumpSerial with CalVer semantics + NN exhaustion handling;
writeAtomic via temp-file rename; commit shells to `git add && git
commit` with configurable author.
- NEW: zonefile_test.go -- 17 tests covering load/lookup/mutate/bump/
write paths.
- REWRITTEN: plugin.go -- ServeDNS is now thin: UPDATE → TSIG → handler;
everything else → Next. No synthetic SOA/NS, no query serving.
- REWRITTEN: update.go -- handleUpdate now opens the zoneFile, loads,
applies (with prereq checks against the loaded RRs), bumps serial,
writes, commits. Detects no-op updates to avoid spurious file writes.
- REWRITTEN: setup.go -- new directives: `zones-dir` (required),
`auto-commit` (default true), `git-author <name> <email>`. Dropped
`nameserver` and `persist`. Validates each declared zone has a file
on disk via os.Stat before CoreDNS finishes starting.
- REWRITTEN: setup_test.go -- 17 cases for the new grammar.
- REWRITTEN: update_test.go -- 11 cases using real temp zone files
via t.TempDir().
Total: 30 tests passing, 0 failures.
Next: Phase 2c (custom CoreDNS image, deploy, smoke test with nsupdate).
Replaces the Phase-1.3 refuseUpdate() stub with a real RFC 2136 handler.
Caddy via caddy-dns/rfc2136 can now inject and remove records.
UPDATE message handling (update.go):
- Zone section validation: must be exactly one SOA-typed record naming
a zone we're authoritative for. Returns FORMERR/NOTAUTH otherwise.
- Prerequisites (§3.2): name-exists, RRset-exists, name-NOT-exists,
RRset-NOT-exists semantics implemented. First failure short-circuits
with the spec's rcode (NXDOMAIN/NXRRSET/YXDOMAIN/YXRRSET).
- Updates (§3.4.2): add RR, delete RRset (CLASS=ANY+RDLEN=0), delete
all RRsets at name (CLASS=ANY+TYPE=ANY), delete specific RR (CLASS=
NONE).
- Apex SOA/NS protected: synthetic and cannot be added or removed via
UPDATE. Apex wipe (TYPE=ANY at apex) also refused.
- Default TTL applied to incoming records with TTL=0.
TSIG (tsig.go + setup.go):
- setup() now populates dnsserver.Config.TsigSecret so the underlying
dns.Server auto-verifies signatures via miekg/dns.
- checkTSIG() in ServeDNS gates UPDATEs: rejects if no TSIG, unknown
key name, algorithm-downgrade attempt, or w.TsigStatus() != nil.
- No TSIG keys configured → all UPDATEs refused (safety default).
- Algorithm pinning prevents downgrade attacks (e.g. forced HMAC-MD5).
Tests (update_test.go): 11 new cases covering happy paths and every
error rcode. Total: 35 top-level test passes, 0 failures.
ServeDNS dispatch now calls handleUpdate after auth gate. The
refuseUpdate() stub is gone. UPDATE end-to-end via nsupdate requires
the custom CoreDNS image (Phase 2) to verify TSIG plumbing on the
dns.Server side.
Sets up the package layout for a CoreDNS plugin that will accept RFC 2136
dynamic updates with TSIG authentication, primarily targeting self-hosted
ACME DNS-01 cert automation.
What this commit gives us:
- go.mod against coredns/caddy v1.1.4, coredns/coredns v1.14.3, miekg/dns v1.1.72
- plugin.go: RFC2136 struct + Handler interface (ServeDNS is pass-through)
- setup.go: init() registration + Corefile parser (skeleton — recognizes
tsig-key, ttl, persist directives but doesn't yet wire them)
- README.md, .gitignore
go build ./... clean. No tests yet — those come with Phase 1.2 alongside
the actual UPDATE handler and in-memory store.
Plan: ~/.claude/plans/dood-does-coredns-offer-enumerated-piglet.md