H3+H4 — Zone SOA invariant. After parsing, loadRRs enforces:
exactly one SOA, owned by the zone apex. Catches three failure modes
with a single guard:
- Missing SOA (H4): a malformed line earlier in the file may have
tripped miekg/dns's ZoneParser into dropping records without
reporting an error via parser.Err(). If the SOA went missing, we
refuse rather than treat the partial parse as authoritative.
- Multiple SOAs (H3): zone files with accidental duplicate SOA
records produce inconsistent zone state visible to AXFR clients.
The old code's first-match SOA-bump would silently propagate the
inconsistency. Now we refuse.
- Non-apex SOA (H3): an SOA whose owner doesn't match the zone
origin is either a parse error or a hand-edit mistake; bumping
it would leave the real apex unchanged. Now we refuse.
assertSingleApexSOA returns a descriptive error so the failure mode
is actionable from logs alone.
H5 — MaxUint32 guard in bumpSerial. The old "+1 defensive advance"
branch would wrap to 0 if soa.Serial == MaxUint32, and downstream
secondaries per RFC 1982 §3.2 treat 0-after-MaxUint32 as "older"
(they refuse to AXFR and the zone goes dark). Now we explicitly check
and refuse with a loud message; operator must reset the serial
manually. Practical reach is zero for our deployment (10000 bumps/day
× 117 years would still fit uint32) but the defensive ceiling matters
for fuzz, hand-edit, or future code-path errors.
The full RFC 1982 wraparound-aware comparison was prototyped but
removed: it broke the legacy-format migration case where a tiny
non-CalVer serial (e.g., 12345) is "more than 2^31 distant" from a
new-format serial (~2.6B), which RFC 1982 reads as "going backwards"
and would block migration. Naive `>` is correct in practice; the
MaxUint32 case is the only real failure mode worth guarding.
New tests:
- TestBumpSerial_MaxUint32_RefusesWrap
- TestLoadRRs_NoSOA_Refused
- TestLoadRRs_MultipleSOAs_Refused
- TestLoadRRs_NonApexSOA_Refused
H1 — Concurrent-modification detection. loadRRs now returns a
fileSnapshot capturing (mtime, size) at read time. handleUpdate calls
zf.checkUnchanged(snap) immediately before writeAtomic. If anything
modified the file between load and write — rsync push, manual edit,
`git checkout` — the UPDATE is refused with SERVFAIL. Caddy retries
with a fresh load. Protects against the CLAUDE.md-documented rsync
workflow racing the plugin.
H2 — Git commit-failure policy. The previous code logged at WARN and
continued, breaking the documented "file + git both updated" contract.
Now logs at ERROR with structured fields (zone, path, error, recovery
command) so operators discover the divergence. We do NOT roll back the
file write: by the time the commit fails, the auto plugin may have
already noticed the new mtime and reloaded; rolling back creates more
races than it solves. Recovery is `git -C <dir> status` + manual
commit.
M1 — exec.CommandContext with 10s timeout on git invocations. If git
hangs (NFS stall, gpg-sign prompt, broken pre-commit hook waiting on
stdin), the per-zone mutex would otherwise be held forever and queue
all subsequent UPDATEs. gitCommandTimeout caps the hang.
M2 deferred. Dropping the separate `git add` cleanly requires either
`-a` (wrong scope: auto-stages all tracked modifications) or `--include`
(still needs prior staging). The race window between add and commit is
theoretical for our setup (single-writer plugin + occasional `git
status`). M1's timeout already mitigates the worst hang case.
New tests:
- TestZoneFile_CheckUnchanged_DetectsExternalModification (H1)
The previous YYYYMMDDNN encoding capped at NN=99 (100 bumps/day) and
hard-failed UPDATEs once the day's counter was exhausted — confirmed
in production on 2026-05-22 when ACME activity across the supported.
systems zone hit the cap and SERVFAILed every subsequent UPDATE.
New format: YYMMDD*10000+NNNN. With 4-digit NNNN we get 10000/day, and
dropping the century keeps a 2026-dated serial (2,605,229,999 max) under
uint32's 4,294,967,295 ceiling. A 4-digit year (e.g., 20260522*10000)
would overflow uint32 — RFC 1035's SOA serial type bounds this.
Three behavior changes:
1. On NNNN=9999, roll forward to the next encoded day with NNNN=0001
rather than erroring. The encoded date drifts ahead of wall time on
heavy churn days and catches up on quiet days; monotonic ordering
(the only DNS requirement) holds.
2. Future-encoded serials (from a prior rollover) are honoured — the
previous "older date" branch downgraded them back to today*100+1,
producing a backwards serial. This bug also tripped a manual
workaround on the same day. Now: future encoded dates bump their
own NNNN.
3. Legacy YYYYMMDDNN serials migrate automatically on first bump. A
value like 2026052299 (~2.026B) is numerically smaller than today's
new-format minimum 2605220001 (~2.605B), so the older-or-unparseable
branch fires and rewrites in place. New > old, so AXFR receivers
treat it as a clean forward bump.
Tests cover same-day, rollover, future-encoded no-regress, legacy
migration, non-CalVer reset, and no-SOA error.
Major architectural pivot per the user's "RFC 2136 mechanism for the
existing zonefiles, not a new in-memory thing" framing. The plugin no
longer maintains its own in-memory state OR serves any queries -- both
of those are now the auto plugin's job, reading the same zone files.
The plugin's sole responsibility is now: receive TSIG-authed UPDATE
messages, edit the matching zones/<zone>.zone file, bump the SOA
serial in CalVer (YYYYMMDDNN) form, and optionally auto-commit to git.
What changed:
- DELETED: store.go (in-memory recordStore), store_test.go (12 tests),
plugin_test.go (10 ServeDNS query tests), old update_test.go.
- NEW: zonefile.go -- file-backed authority for one zone. loadRRs via
miekg/dns zone parser; mutation helpers (lookupIn/nameExistsIn/
removeRRsetFrom/removeRRFrom/removeNameFrom/addRRTo) on []dns.RR
slices; bumpSerial with CalVer semantics + NN exhaustion handling;
writeAtomic via temp-file rename; commit shells to `git add && git
commit` with configurable author.
- NEW: zonefile_test.go -- 17 tests covering load/lookup/mutate/bump/
write paths.
- REWRITTEN: plugin.go -- ServeDNS is now thin: UPDATE → TSIG → handler;
everything else → Next. No synthetic SOA/NS, no query serving.
- REWRITTEN: update.go -- handleUpdate now opens the zoneFile, loads,
applies (with prereq checks against the loaded RRs), bumps serial,
writes, commits. Detects no-op updates to avoid spurious file writes.
- REWRITTEN: setup.go -- new directives: `zones-dir` (required),
`auto-commit` (default true), `git-author <name> <email>`. Dropped
`nameserver` and `persist`. Validates each declared zone has a file
on disk via os.Stat before CoreDNS finishes starting.
- REWRITTEN: setup_test.go -- 17 cases for the new grammar.
- REWRITTEN: update_test.go -- 11 cases using real temp zone files
via t.TempDir().
Total: 30 tests passing, 0 failures.
Next: Phase 2c (custom CoreDNS image, deploy, smoke test with nsupdate).