H1 — Concurrent-modification detection. loadRRs now returns a
fileSnapshot capturing (mtime, size) at read time. handleUpdate calls
zf.checkUnchanged(snap) immediately before writeAtomic. If anything
modified the file between load and write — rsync push, manual edit,
`git checkout` — the UPDATE is refused with SERVFAIL. Caddy retries
with a fresh load. Protects against the CLAUDE.md-documented rsync
workflow racing the plugin.
H2 — Git commit-failure policy. The previous code logged at WARN and
continued, breaking the documented "file + git both updated" contract.
Now logs at ERROR with structured fields (zone, path, error, recovery
command) so operators discover the divergence. We do NOT roll back the
file write: by the time the commit fails, the auto plugin may have
already noticed the new mtime and reloaded; rolling back creates more
races than it solves. Recovery is `git -C <dir> status` + manual
commit.
M1 — exec.CommandContext with 10s timeout on git invocations. If git
hangs (NFS stall, gpg-sign prompt, broken pre-commit hook waiting on
stdin), the per-zone mutex would otherwise be held forever and queue
all subsequent UPDATEs. gitCommandTimeout caps the hang.
M2 deferred. Dropping the separate `git add` cleanly requires either
`-a` (wrong scope: auto-stages all tracked modifications) or `--include`
(still needs prior staging). The race window between add and commit is
theoretical for our setup (single-writer plugin + occasional `git
status`). M1's timeout already mitigates the worst hang case.
New tests:
- TestZoneFile_CheckUnchanged_DetectsExternalModification (H1)
C1 — Document the process-global MsgAcceptFunc mutation:
CoreDNS 1.14.3 doesn't expose per-Config MsgAcceptFunc (server.go:159
hardcodes the dns.Server struct), so the override has to be global. The
init()-level comment now explains the operational consequences in
detail, and setup() emits a loud INFO log calling out the global scope
for operator audit. Upstream support for per-Config MsgAcceptFunc would
let us delete the whole stanza.
C2 — handleUpdate now requires the caller to assert TSIG verification
via an explicit `verified bool` parameter. The security contract is
encoded in the function signature, not in convention. ServeDNS passes
verified=true after checkTSIG succeeds; verified=false produces an
immediate Refused with no state mutation. Future internal callers
(NOTIFY relay, admin RPC, refactor) physically cannot reach the
mutation code without proving the request was authenticated.
M9 — Don't sign TSIG-failure rejection responses. Per Hamilton's
finding, signing a rejection with the named key attests "yes, this
server holds that key" — useful intel for an attacker probing key
existence. Unsigned Refused is the right shape: nsupdate sees "no TSIG
on reply" and treats as auth failure, which is what actually happened.
New test TestUpdate_UnverifiedCaller_Refused proves the C2 contract:
handleUpdate(w, msg, false) refuses, zone file unchanged.
The previous YYYYMMDDNN encoding capped at NN=99 (100 bumps/day) and
hard-failed UPDATEs once the day's counter was exhausted — confirmed
in production on 2026-05-22 when ACME activity across the supported.
systems zone hit the cap and SERVFAILed every subsequent UPDATE.
New format: YYMMDD*10000+NNNN. With 4-digit NNNN we get 10000/day, and
dropping the century keeps a 2026-dated serial (2,605,229,999 max) under
uint32's 4,294,967,295 ceiling. A 4-digit year (e.g., 20260522*10000)
would overflow uint32 — RFC 1035's SOA serial type bounds this.
Three behavior changes:
1. On NNNN=9999, roll forward to the next encoded day with NNNN=0001
rather than erroring. The encoded date drifts ahead of wall time on
heavy churn days and catches up on quiet days; monotonic ordering
(the only DNS requirement) holds.
2. Future-encoded serials (from a prior rollover) are honoured — the
previous "older date" branch downgraded them back to today*100+1,
producing a backwards serial. This bug also tripped a manual
workaround on the same day. Now: future encoded dates bump their
own NNNN.
3. Legacy YYYYMMDDNN serials migrate automatically on first bump. A
value like 2026052299 (~2.026B) is numerically smaller than today's
new-format minimum 2605220001 (~2.605B), so the older-or-unparseable
branch fires and rewrites in place. New > old, so AXFR receivers
treat it as a clean forward bump.
Tests cover same-day, rollover, future-encoded no-regress, legacy
migration, non-CalVer reset, and no-SOA error.
When a request arrives with TSIG, attach a TSIG record to the response
so dns.ResponseWriter computes the MAC at write time using the secret
in TsigSecret. Without this, BIND nsupdate complains "expected a TSIG
or SIG(0)" on every UPDATE, even when the update applies successfully.
Two response paths fixed:
- handleUpdate success/per-rcode replies (update.go)
- ServeDNS rejection when TSIG verification fails (plugin.go)
The new helper in tsig.go is a no-op for unsigned requests. Unknown
keys still silently skip signing — we can't authenticate to a peer we
don't share a key with.
Tests verify both branches: signed request → response carries matching
TSIG (key name + algorithm); unsigned request → response stays plain.
Major architectural pivot per the user's "RFC 2136 mechanism for the
existing zonefiles, not a new in-memory thing" framing. The plugin no
longer maintains its own in-memory state OR serves any queries -- both
of those are now the auto plugin's job, reading the same zone files.
The plugin's sole responsibility is now: receive TSIG-authed UPDATE
messages, edit the matching zones/<zone>.zone file, bump the SOA
serial in CalVer (YYYYMMDDNN) form, and optionally auto-commit to git.
What changed:
- DELETED: store.go (in-memory recordStore), store_test.go (12 tests),
plugin_test.go (10 ServeDNS query tests), old update_test.go.
- NEW: zonefile.go -- file-backed authority for one zone. loadRRs via
miekg/dns zone parser; mutation helpers (lookupIn/nameExistsIn/
removeRRsetFrom/removeRRFrom/removeNameFrom/addRRTo) on []dns.RR
slices; bumpSerial with CalVer semantics + NN exhaustion handling;
writeAtomic via temp-file rename; commit shells to `git add && git
commit` with configurable author.
- NEW: zonefile_test.go -- 17 tests covering load/lookup/mutate/bump/
write paths.
- REWRITTEN: plugin.go -- ServeDNS is now thin: UPDATE → TSIG → handler;
everything else → Next. No synthetic SOA/NS, no query serving.
- REWRITTEN: update.go -- handleUpdate now opens the zoneFile, loads,
applies (with prereq checks against the loaded RRs), bumps serial,
writes, commits. Detects no-op updates to avoid spurious file writes.
- REWRITTEN: setup.go -- new directives: `zones-dir` (required),
`auto-commit` (default true), `git-author <name> <email>`. Dropped
`nameserver` and `persist`. Validates each declared zone has a file
on disk via os.Stat before CoreDNS finishes starting.
- REWRITTEN: setup_test.go -- 17 cases for the new grammar.
- REWRITTEN: update_test.go -- 11 cases using real temp zone files
via t.TempDir().
Total: 30 tests passing, 0 failures.
Next: Phase 2c (custom CoreDNS image, deploy, smoke test with nsupdate).
Replaces the Phase-1.3 refuseUpdate() stub with a real RFC 2136 handler.
Caddy via caddy-dns/rfc2136 can now inject and remove records.
UPDATE message handling (update.go):
- Zone section validation: must be exactly one SOA-typed record naming
a zone we're authoritative for. Returns FORMERR/NOTAUTH otherwise.
- Prerequisites (§3.2): name-exists, RRset-exists, name-NOT-exists,
RRset-NOT-exists semantics implemented. First failure short-circuits
with the spec's rcode (NXDOMAIN/NXRRSET/YXDOMAIN/YXRRSET).
- Updates (§3.4.2): add RR, delete RRset (CLASS=ANY+RDLEN=0), delete
all RRsets at name (CLASS=ANY+TYPE=ANY), delete specific RR (CLASS=
NONE).
- Apex SOA/NS protected: synthetic and cannot be added or removed via
UPDATE. Apex wipe (TYPE=ANY at apex) also refused.
- Default TTL applied to incoming records with TTL=0.
TSIG (tsig.go + setup.go):
- setup() now populates dnsserver.Config.TsigSecret so the underlying
dns.Server auto-verifies signatures via miekg/dns.
- checkTSIG() in ServeDNS gates UPDATEs: rejects if no TSIG, unknown
key name, algorithm-downgrade attempt, or w.TsigStatus() != nil.
- No TSIG keys configured → all UPDATEs refused (safety default).
- Algorithm pinning prevents downgrade attacks (e.g. forced HMAC-MD5).
Tests (update_test.go): 11 new cases covering happy paths and every
error rcode. Total: 35 top-level test passes, 0 failures.
ServeDNS dispatch now calls handleUpdate after auth gate. The
refuseUpdate() stub is gone. UPDATE end-to-end via nsupdate requires
the custom CoreDNS image (Phase 2) to verify TSIG plumbing on the
dns.Server side.
Sets up the package layout for a CoreDNS plugin that will accept RFC 2136
dynamic updates with TSIG authentication, primarily targeting self-hosted
ACME DNS-01 cert automation.
What this commit gives us:
- go.mod against coredns/caddy v1.1.4, coredns/coredns v1.14.3, miekg/dns v1.1.72
- plugin.go: RFC2136 struct + Handler interface (ServeDNS is pass-through)
- setup.go: init() registration + Corefile parser (skeleton — recognizes
tsig-key, ttl, persist directives but doesn't yet wire them)
- README.md, .gitignore
go build ./... clean. No tests yet — those come with Phase 1.2 alongside
the actual UPDATE handler and in-memory store.
Plan: ~/.claude/plans/dood-does-coredns-offer-enumerated-piglet.md