10 Commits

Author SHA1 Message Date
7367401734 Send DNS NOTIFY to secondaries after every UPDATE
Per RFC 1996, a master that mutates a zone SHOULD notify its
secondaries so they can immediately AXFR rather than wait for their
next SOA-refresh poll. Without this, propagation lag from UPDATE to
public DNS is bounded by the secondary's refresh interval (300s for
us) — which is borderline for ACME validation timing.

New Corefile directive:
    notify <host[:port]> [<host[:port]>...]

Targets accept bare hostnames (port 53 default), host:port, or
[ipv6]:port. The same list applies to every zone in the rfc2136
block.

Implementation: fire-and-forget UDP per target, each in its own
goroutine, capped by a 2s timeout. The UPDATE response to the client
is never held pending NOTIFY acks (RFC 1996 §4 explicitly decouples
them). Failures log at DEBUG only — a briefly-unreachable secondary
is normal and would otherwise spam logs.

Retires the external scripts/notify-secondaries.py workflow for any
deployment that wires the directive: secondaries now hear about
changes within seconds of the UPDATE landing, no cron or manual
invocation needed.

New tests:
- TestSendNotify_DeliversToTarget — packet arrives, opcode + zone correct
- TestSendNotify_NoTargets_NoCrash — empty list short-circuits
- TestSendNotify_BadTarget_LogsButDoesNotBlock — fire-and-forget timing
- TestNotifyOne_AppendsDefaultPort — host vs host:port normalization
2026-05-23 00:54:45 -06:00
8d1477350a M8: per-key UPDATE rate limiting (token bucket)
Hamilton M8: a compromised TSIG key — or a misconfigured client
retrying forever — must not be able to drive unbounded UPDATE traffic.
Each UPDATE costs disk IOPS, a git commit, and a slot in the SOA
serial counter (now 9999/day per zone). Without a cap, a few hours of
runaway traffic could exhaust the SOA serial counter and brick the
zone for the day.

Implementation: per-key token bucket in ratelimit.go. Default 100
tokens / 60 seconds. New keys start full so legitimate clients see no
delay at boot. Refill is continuous, capped at the burst value.

Configurable in Corefile:
  rate-limit off                    # disable entirely
  rate-limit <burst> <period-secs>  # e.g., rate-limit 200 60

Enforcement runs in ServeDNS after TSIG verification — a request that
fails auth doesn't consume a token (and a forged TSIG can't be used to
deny service to a real key holder, since we never reached the rate
check).

100/min is well above ACME's needs: a worst-case full-renewal storm
across our ~84 zones emits maybe 200 UPDATEs total over several
minutes. Anything beyond is suspicious by definition.

New tests covering: first-call allowed, burst exhaustion, refill
behavior, per-key isolation, refill-cap (no idle-accumulation
overflow).
2026-05-22 21:31:17 -06:00
6ab2b6af6d H6/H7/M3/M4/M7: hardening + behavior documentation
H6 — TSIG replay-window test. New TestCheckTSIG_BadStatus_Refused
verifies that when miekg/dns reports a TSIG verification failure via
ResponseWriter.TsigStatus (the channel for fudge-window violations,
bad MACs, expired timestamps), our plugin refuses. The fudge tolerance
itself is miekg/dns's default (300s); documented in tsig.go so
operators know the dependency.

H7 — No-op UPDATE policy: documented explicitly in update.go. We do
NOT bump the SOA on a no-op (deduped) UPDATE — forcing downstream
secondaries to AXFR identical content wastes bandwidth and contradicts
RFC 2136's intent. Callers wanting to force a serial bump can send a
throwaway add+delete pair (touch-UPDATE pattern).

M3 — Delete-by-exact-match ignores TTL and class per RFC 2136 §2.5.4.
The previous rr.String() comparison included TTL, so an UPDATE with
CLASS=NONE TTL=0 (the protocol-required encoding for a delete) failed
to match stored RRs at CLASS=IN with non-zero TTL. Now we normalize
both sides (TTL=0, class=IN) before invoking dns.IsDuplicate.

M4 — validateZoneFiles now actually parses each zone at startup
(loadRRs invocation). Previously it only stat()'d the file; corrupt
zone content sailed through startup and produced SERVFAIL on the first
UPDATE with no startup-time signal. Combined with H3+H4's invariant
checks, this turns silent zone corruption into immediate startup
failure.

M7 — Commit-message sanitization. RR names are attacker-controlled
(TSIG only authenticates the sender; the payload is hostile by
default). Control characters in commit messages could inject newlines
into git log or ANSI sequences into downstream log renderers. New
sanitizeForCommitMessage escapes \n, \r, \t, and other C0 controls.

New tests:
- TestCheckTSIG_BadStatus_Refused (H6)
- TestUpdate_DeleteRR_IgnoresTTL (M3)
- TestSanitizeForCommitMessage (M7)
2026-05-22 21:29:13 -06:00
8e421f925e C1/C2/M9: tighten security boundary at handleUpdate
C1 — Document the process-global MsgAcceptFunc mutation:
CoreDNS 1.14.3 doesn't expose per-Config MsgAcceptFunc (server.go:159
hardcodes the dns.Server struct), so the override has to be global. The
init()-level comment now explains the operational consequences in
detail, and setup() emits a loud INFO log calling out the global scope
for operator audit. Upstream support for per-Config MsgAcceptFunc would
let us delete the whole stanza.

C2 — handleUpdate now requires the caller to assert TSIG verification
via an explicit `verified bool` parameter. The security contract is
encoded in the function signature, not in convention. ServeDNS passes
verified=true after checkTSIG succeeds; verified=false produces an
immediate Refused with no state mutation. Future internal callers
(NOTIFY relay, admin RPC, refactor) physically cannot reach the
mutation code without proving the request was authenticated.

M9 — Don't sign TSIG-failure rejection responses. Per Hamilton's
finding, signing a rejection with the named key attests "yes, this
server holds that key" — useful intel for an attacker probing key
existence. Unsigned Refused is the right shape: nsupdate sees "no TSIG
on reply" and treats as auth failure, which is what actually happened.

New test TestUpdate_UnverifiedCaller_Refused proves the C2 contract:
handleUpdate(w, msg, false) refuses, zone file unchanged.
2026-05-22 21:18:47 -06:00
1fe95e3f6c Revert diagnostic ServeDNS log line 2026-05-21 11:46:49 -06:00
0f28127284 Phase 2b: refactor to file-backed storage; UPDATE writes zones/*.zone
Major architectural pivot per the user's "RFC 2136 mechanism for the
existing zonefiles, not a new in-memory thing" framing. The plugin no
longer maintains its own in-memory state OR serves any queries -- both
of those are now the auto plugin's job, reading the same zone files.

The plugin's sole responsibility is now: receive TSIG-authed UPDATE
messages, edit the matching zones/<zone>.zone file, bump the SOA
serial in CalVer (YYYYMMDDNN) form, and optionally auto-commit to git.

What changed:
- DELETED: store.go (in-memory recordStore), store_test.go (12 tests),
  plugin_test.go (10 ServeDNS query tests), old update_test.go.
- NEW: zonefile.go -- file-backed authority for one zone. loadRRs via
  miekg/dns zone parser; mutation helpers (lookupIn/nameExistsIn/
  removeRRsetFrom/removeRRFrom/removeNameFrom/addRRTo) on []dns.RR
  slices; bumpSerial with CalVer semantics + NN exhaustion handling;
  writeAtomic via temp-file rename; commit shells to `git add && git
  commit` with configurable author.
- NEW: zonefile_test.go -- 17 tests covering load/lookup/mutate/bump/
  write paths.
- REWRITTEN: plugin.go -- ServeDNS is now thin: UPDATE → TSIG → handler;
  everything else → Next. No synthetic SOA/NS, no query serving.
- REWRITTEN: update.go -- handleUpdate now opens the zoneFile, loads,
  applies (with prereq checks against the loaded RRs), bumps serial,
  writes, commits. Detects no-op updates to avoid spurious file writes.
- REWRITTEN: setup.go -- new directives: `zones-dir` (required),
  `auto-commit` (default true), `git-author <name> <email>`. Dropped
  `nameserver` and `persist`. Validates each declared zone has a file
  on disk via os.Stat before CoreDNS finishes starting.
- REWRITTEN: setup_test.go -- 17 cases for the new grammar.
- REWRITTEN: update_test.go -- 11 cases using real temp zone files
  via t.TempDir().

Total: 30 tests passing, 0 failures.

Next: Phase 2c (custom CoreDNS image, deploy, smoke test with nsupdate).
2026-05-21 11:26:50 -06:00
1d2d919728 Phase 1.4: UPDATE opcode handler + TSIG verification
Replaces the Phase-1.3 refuseUpdate() stub with a real RFC 2136 handler.
Caddy via caddy-dns/rfc2136 can now inject and remove records.

UPDATE message handling (update.go):
- Zone section validation: must be exactly one SOA-typed record naming
  a zone we're authoritative for. Returns FORMERR/NOTAUTH otherwise.
- Prerequisites (§3.2): name-exists, RRset-exists, name-NOT-exists,
  RRset-NOT-exists semantics implemented. First failure short-circuits
  with the spec's rcode (NXDOMAIN/NXRRSET/YXDOMAIN/YXRRSET).
- Updates (§3.4.2): add RR, delete RRset (CLASS=ANY+RDLEN=0), delete
  all RRsets at name (CLASS=ANY+TYPE=ANY), delete specific RR (CLASS=
  NONE).
- Apex SOA/NS protected: synthetic and cannot be added or removed via
  UPDATE. Apex wipe (TYPE=ANY at apex) also refused.
- Default TTL applied to incoming records with TTL=0.

TSIG (tsig.go + setup.go):
- setup() now populates dnsserver.Config.TsigSecret so the underlying
  dns.Server auto-verifies signatures via miekg/dns.
- checkTSIG() in ServeDNS gates UPDATEs: rejects if no TSIG, unknown
  key name, algorithm-downgrade attempt, or w.TsigStatus() != nil.
- No TSIG keys configured → all UPDATEs refused (safety default).
- Algorithm pinning prevents downgrade attacks (e.g. forced HMAC-MD5).

Tests (update_test.go): 11 new cases covering happy paths and every
error rcode. Total: 35 top-level test passes, 0 failures.

ServeDNS dispatch now calls handleUpdate after auth gate. The
refuseUpdate() stub is gone. UPDATE end-to-end via nsupdate requires
the custom CoreDNS image (Phase 2) to verify TSIG plumbing on the
dns.Server side.
2026-05-21 10:51:18 -06:00
1cca9a5aa7 Phase 1.3: in-memory store + ServeDNS query dispatch
ServeDNS now answers authoritatively for the configured zone(s):
- Apex SOA → synthetic SOA (serial = store generation counter)
- Apex NS  → synthetic NS pointing at p.Nameserver
- In-store lookups for any qtype
- NODATA vs NXDOMAIN correctly distinguished (SOA in authority section)
- UPDATE opcode → REFUSED (Phase 1.4 implements properly)
- Queries outside our zones pass through to Next

Added:
- store.go: recordStore with sync.RWMutex + atomic generation counter.
  Operations: Add (de-dupes), RemoveRRset, RemoveRR, RemoveName, Lookup
  (returns a copy so callers can't corrupt internal state), NameExists.
  All keyed on canonical lowercase + trailing-dot names.
- plugin.go: ServeDNS dispatch, findZone (longest-suffix match),
  syntheticSOA, syntheticNS. New Nameserver field.
- setup.go: nameserver directive. Default Nameserver = first zone apex.
  Store initialised at parse time.
- store_test.go: 12 unit tests covering add/dedupe/remove/lookup/
  generation/case-insensitivity/copy-safety.
- plugin_test.go: 10 dispatch tests covering pass-through, apex
  synthetics, in-store lookups, NXDOMAIN/NODATA semantics, UPDATE
  refusal, findZone longest-suffix-wins and case behavior.
- setup_test.go: 3 new cases for the nameserver directive + store init.

Total: 38 tests passing.

Module: git.supported.systems/rsp2k/coredns-rfc2136
2026-05-21 10:37:48 -06:00
eba6313ec0 Phase 1.2: wire parser → typed config + 13 unit tests
The Corefile parser now fully populates typed fields on RFC2136 instead
of just recognising directives. Validation happens at parse-time so
configuration errors fail loud at CoreDNS startup rather than silent at
request time.

Added:
- config.go: tsigKey type, TSIG algorithm allowlist (rejects HMAC-MD5
  deliberately), base64 secret decoder with 8-byte minimum length check,
  canonical-key-name normalisation (lowercase + trailing dot).
- plugin.go: RFC2136 struct now carries TSIGKeys map, TTL uint32,
  PersistPath string. DefaultTTL=60.
- setup.go: parse() validates and stores tsig-key/ttl/persist directives.
  Duplicate key names rejected. Multiple TSIG keys allowed (for rotation).
  At-least-one-zone is enforced.
- setup_test.go: 13 table-driven cases (5 happy + 8 error paths) using
  caddy.NewTestController. All pass.

ServeDNS still passes through — UPDATE handling lands in Phase 1.4.

Module path: git.supported.systems/rsp2k/coredns-rfc2136
2026-05-21 10:31:22 -06:00
e9d37f483c Initial commit: plugin skeleton, compiles against CoreDNS 1.14.3
Sets up the package layout for a CoreDNS plugin that will accept RFC 2136
dynamic updates with TSIG authentication, primarily targeting self-hosted
ACME DNS-01 cert automation.

What this commit gives us:
- go.mod against coredns/caddy v1.1.4, coredns/coredns v1.14.3, miekg/dns v1.1.72
- plugin.go: RFC2136 struct + Handler interface (ServeDNS is pass-through)
- setup.go: init() registration + Corefile parser (skeleton — recognizes
  tsig-key, ttl, persist directives but doesn't yet wire them)
- README.md, .gitignore

go build ./... clean. No tests yet — those come with Phase 1.2 alongside
the actual UPDATE handler and in-memory store.

Plan: ~/.claude/plans/dood-does-coredns-offer-enumerated-piglet.md
2026-05-20 18:25:36 -06:00