Compare commits

..

No commits in common. "main" and "fix/wider-serial-counter" have entirely different histories.

12 changed files with 87 additions and 1140 deletions

130
README.md
View File

@ -33,25 +33,17 @@ messages — zero static zone-file churn.
## Status
**v2026.05.22.2**: production-ready. Handles UPDATE messages against
file-backed zones, TSIG-authenticates, bumps SOA serial in CalVer
YYMMDD*10000+NNNN form, atomically writes the zone file, optionally
git-commits each change for an audit trail. Designed to coexist with
CoreDNS's `auto` plugin (which serves queries from the same zone files
on its reload cycle).
**Phase 1 (skeleton)**: compiles, registers with CoreDNS, parses the
Corefile directive. Does not yet handle UPDATE messages or serve any
records. ServeDNS is a pass-through. See `phases.md` for the roadmap.
## Configuration
```
rfc2136 <zone> [<zone>...] {
zones-dir <path> # required
tsig-key <name> <algorithm> <base64-secret> # may repeat
ttl <seconds> # default 60
auto-commit <true|false> # default true
git-author <name> <email> # optional
rate-limit <burst> <period-seconds> # default 100 / 60s
rate-limit off # disable rate-limit
notify <host[:port]> [<host[:port]>...] # NOTIFY secondaries on every UPDATE
tsig-key <key-name> <algorithm> <base64-secret>
ttl <seconds>
persist <path>
}
```
@ -60,124 +52,14 @@ Example:
```
.:53 auth.example.com {
rfc2136 auth.example.com {
zones-dir /var/lib/coredns/zones
tsig-key acme-key. hmac-sha256 BASE64SECRET==
ttl 60
auto-commit true
git-author "coredns-rfc2136" "rfc2136@coredns.example.com"
}
auto {
directory /var/lib/coredns/zones (.*)\.zone {1}
reload 30s
}
errors
log
}
```
## Operational constraints
A few behaviors operators should know before relying on this plugin:
### Single-process atomicity only
The per-zone mutex serializes UPDATEs *within one CoreDNS process*. It
does NOT coordinate with external file edits. If you `rsync` a zone
file from a workstation while the plugin is mid-UPDATE, you get a
race. The plugin defends against this with a snapshot-and-recheck:
loadRRs captures (mtime, size), and immediately before writing back,
we re-stat; if the file changed, the UPDATE is refused with SERVFAIL
and the client (Caddy etc.) retries on a fresh load. The window is
narrow but non-zero.
**Recommendation**: don't rsync zone files into a directory the plugin
is actively writing to. If you must, expect occasional SERVFAILs that
resolve on retry.
### MsgAcceptFunc is process-global
CoreDNS 1.14.3 doesn't expose a per-Config `MsgAcceptFunc`, so this
plugin overrides the miekg/dns package-level default at `init()` time.
**Every server block** in the process will accept the UPDATE opcode
at the wire layer — but only blocks with `rfc2136` in their plugin
chain do anything useful with it (others pass through and return
FormatError). The actual security boundary is TSIG, enforced both in
`ServeDNS` and as a defense-in-depth check inside `handleUpdate`.
### No-op UPDATEs do not bump the SOA serial
If an UPDATE adds an RR that's already present (deduped per RFC 2136
§3.4.2.2) or deletes one that doesn't exist, the file is not rewritten
and the SOA serial is not advanced. We return NOERROR. Downstream
secondaries are not asked to AXFR for a no-change.
If you need to force a serial bump (rare), send a touch-UPDATE: add a
throwaway RR then delete it.
### SOA invariants are enforced strictly
`loadRRs` refuses zone files with: zero SOAs, multiple SOAs, or an
SOA whose owner doesn't match the zone apex. Both at startup (via
`validateZoneFiles`) and on every UPDATE. Zone-file corruption fails
loud at boot rather than mysteriously on first ACME activity.
### Serial counter rolls over at NNNN=9999
Format is `YYMMDD*10000 + NNNN`. At NNNN=9999, the next bump rolls to
the next encoded day with NNNN=0001. On heavy days the encoded date
drifts ahead of wall time; on quiet days it catches back up. Monotonic
ordering (the only DNS requirement) holds. uint32 won't wrap for ~117
years at full 10000/day burn.
### TSIG replay window is miekg/dns's default (currently 300s)
The fudge window enforced by miekg/dns's TsigVerify is what gates
replay. If miekg/dns ever changes its default, this plugin's behavior
changes with it. A future enhancement is `tsig-fudge` as a Corefile
directive.
### Git commit failure is logged at ERROR, not rolled back
If `git commit` fails after a successful `writeAtomic`, the zone file
is correct but the audit trail diverges. We log at ERROR with a
recovery hint (`git -C <dir> status` + manual commit). We do NOT roll
back the file write — the auto plugin may have already noticed the
new mtime, and rolling back creates more races than it solves.
### Per-key rate limit
UPDATE traffic is token-bucket capped per TSIG key. Default 100
UPDATEs per 60 seconds. ACME storms are well within this; anything
beyond is suspicious. Tune via `rate-limit <burst> <period>`.
### NOTIFY to secondaries (optional)
After every successful UPDATE, the plugin can fire DNS NOTIFY (RFC
1996) to a list of secondary servers. This collapses propagation lag
from "up to the secondary's SOA refresh interval" (often 300s) to
"a few seconds" — secondaries that receive NOTIFY do an immediate SOA
poll and AXFR if changed.
Configure with the `notify` directive:
```
rfc2136 example.com {
zones-dir /zones
tsig-key ...
notify ns2.example.com ns3.example.com 216.218.130.2
}
```
Semantics:
- Targets may be `host`, `host:port`, or `[ipv6]:port`. Default port is 53.
- Fire-and-forget: each target gets its own goroutine with a 2s timeout.
The UPDATE response to the client is **not** held pending NOTIFY acks
(RFC 1996 §4 decouples them).
- Failures log at DEBUG only — a briefly-unreachable secondary is
normal and would otherwise spam.
- Missed NOTIFY = no harm; secondary catches up on its own refresh.
- The same target list applies to every zone in the block.
## Building
This plugin is consumed by a custom CoreDNS build via `plugin.cfg`:

View File

@ -1,62 +0,0 @@
package rfc2136
import (
"net"
"time"
"github.com/miekg/dns"
)
// notifyTimeout caps how long any single NOTIFY send can block before
// we give up. RFC 1996 §4 says the master MUST NOT block UPDATE
// acknowledgement on NOTIFY delivery — the secondaries will fall back
// to their own SOA refresh polling if NOTIFY is missed. 2s is plenty
// for a healthy secondary to ack via UDP; a slow/blackholed target
// just times out.
const notifyTimeout = 2 * time.Second
// defaultNotifyPort is appended to any target that doesn't already
// specify host:port. NOTIFY is always-over-port-53 in practice.
const defaultNotifyPort = "53"
// sendNotify dispatches fire-and-forget DNS NOTIFY messages (RFC 1996)
// to every configured secondary for the given zone. Each target gets
// its own goroutine so a slow/blackholed secondary can't slow
// propagation to its siblings.
//
// We do NOT wait for goroutines to finish — the UPDATE response goes
// back to the client immediately. Whether secondaries ack or not, the
// master's job is done; secondaries that miss the NOTIFY pick up the
// new serial on their next refresh poll.
//
// Failures are logged at Debug level. NOTIFY is best-effort; logging
// at Warning would flood the operator on every transient packet drop
// for secondaries that are intermittently reachable.
func sendNotify(zone string, targets []string) {
if len(targets) == 0 {
return
}
for _, t := range targets {
go notifyOne(zone, t)
}
}
// notifyOne sends one NOTIFY packet to `target` for `zone`. Target
// can be "host" (default port 53), "host:port", or "[ipv6]:port".
func notifyOne(zone, target string) {
addr := target
if _, _, err := net.SplitHostPort(addr); err != nil {
addr = net.JoinHostPort(addr, defaultNotifyPort)
}
msg := new(dns.Msg)
msg.SetNotify(dns.Fqdn(zone))
c := &dns.Client{Net: "udp", Timeout: notifyTimeout}
_, _, err := c.Exchange(msg, addr)
if err != nil {
log.Debugf("NOTIFY %s → %s failed: %v", zone, addr, err)
return
}
log.Debugf("NOTIFY %s → %s ok", zone, addr)
}

View File

@ -1,129 +0,0 @@
package rfc2136
import (
"net"
"sync"
"testing"
"time"
"github.com/miekg/dns"
)
// testNotifyListener spins up a UDP DNS-protocol listener on an
// ephemeral port that captures any messages it receives. Returns the
// host:port string for use as a NOTIFY target, plus a getter for the
// last-captured message.
func testNotifyListener(t *testing.T) (addr string, getLast func() *dns.Msg) {
t.Helper()
conn, err := net.ListenUDP("udp", &net.UDPAddr{IP: net.IPv4(127, 0, 0, 1), Port: 0})
if err != nil {
t.Fatalf("ListenUDP: %v", err)
}
var mu sync.Mutex
var last *dns.Msg
done := make(chan struct{})
go func() {
buf := make([]byte, 512)
for {
conn.SetReadDeadline(time.Now().Add(500 * time.Millisecond))
n, _, err := conn.ReadFromUDP(buf)
if err != nil {
select {
case <-done:
return
default:
continue
}
}
msg := new(dns.Msg)
if unpackErr := msg.Unpack(buf[:n]); unpackErr == nil {
mu.Lock()
last = msg
mu.Unlock()
}
}
}()
t.Cleanup(func() {
close(done)
conn.Close()
})
return conn.LocalAddr().String(), func() *dns.Msg {
mu.Lock()
defer mu.Unlock()
return last
}
}
func TestSendNotify_DeliversToTarget(t *testing.T) {
addr, getLast := testNotifyListener(t)
sendNotify("auth.example.com", []string{addr})
// Wait up to 1s for the packet to arrive (test listener polls on
// 500ms deadline). The send goroutine writes immediately; the
// listener loop just needs one read cycle to pick it up.
deadline := time.Now().Add(1 * time.Second)
for time.Now().Before(deadline) {
if msg := getLast(); msg != nil {
if msg.Opcode != dns.OpcodeNotify {
t.Errorf("Opcode = %d, want OpcodeNotify (%d)", msg.Opcode, dns.OpcodeNotify)
}
if len(msg.Question) != 1 || msg.Question[0].Name != "auth.example.com." {
t.Errorf("Question = %+v, want one entry with name auth.example.com.", msg.Question)
}
if !msg.Authoritative {
t.Errorf("AA flag not set on NOTIFY")
}
return
}
time.Sleep(20 * time.Millisecond)
}
t.Fatal("NOTIFY never arrived at target within 1s")
}
func TestSendNotify_NoTargets_NoCrash(t *testing.T) {
// Empty target list must short-circuit without launching goroutines
// or panicking.
sendNotify("auth.example.com", nil)
sendNotify("auth.example.com", []string{})
// No assertions — survival is the test.
}
func TestSendNotify_BadTarget_LogsButDoesNotBlock(t *testing.T) {
// Target a port we know nothing listens on. The fire-and-forget
// send must return immediately; the goroutine eventually times out.
start := time.Now()
sendNotify("auth.example.com", []string{"127.0.0.1:1"})
if elapsed := time.Since(start); elapsed > 100*time.Millisecond {
t.Errorf("sendNotify blocked %v on unreachable target; expected fire-and-forget", elapsed)
}
}
func TestNotifyOne_AppendsDefaultPort(t *testing.T) {
// Spin up a listener on 127.0.0.1:<random>, then call notifyOne
// with both forms (bare host + host:port) and verify both deliver.
addr, getLast := testNotifyListener(t)
host, port, err := net.SplitHostPort(addr)
if err != nil {
t.Fatalf("split: %v", err)
}
_ = host
// Form 1: host:port (the normal case).
notifyOne("first.example.com", addr)
time.Sleep(100 * time.Millisecond)
if m := getLast(); m == nil || len(m.Question) == 0 || m.Question[0].Name != "first.example.com." {
t.Errorf("host:port form did not deliver: %+v", m)
}
// We can't easily test the bare-host case because port 53 is the
// default and we can't bind there without root. Verifying the
// defaulting branch directly is sufficient.
if port == "" {
t.Fatal("unreachable: SplitHostPort returned empty port")
}
}

View File

@ -19,8 +19,6 @@ package rfc2136
import (
"context"
"strings"
"time"
"github.com/coredns/coredns/plugin"
"github.com/miekg/dns"
@ -64,18 +62,6 @@ type RFC2136 struct {
// zones holds per-zone file handlers, keyed by canonical zone name.
// Populated in setup; mutexes live inside each zoneFile.
zones map[string]*zoneFile
// rateLimit caps UPDATE traffic per TSIG key (Hamilton M8). nil
// disables rate limiting (test mode, or insecure deployments).
// Populated in setup() once TSIG keys are known.
rateLimit *rateLimiter
// NotifyTargets is the list of secondary servers (IP[:port]) to
// send DNS NOTIFY messages to after every successful UPDATE.
// Default port 53. Empty list = no NOTIFY (secondaries rely on
// their own SOA-refresh polling). Configured via the `notify`
// directive in the Corefile.
NotifyTargets []string
}
// Name implements plugin.Handler.
@ -94,28 +80,16 @@ func (p *RFC2136) ServeDNS(ctx context.Context, w dns.ResponseWriter, r *dns.Msg
log.Warningf("UPDATE rejected: %v", err)
resp := new(dns.Msg)
resp.SetRcode(r, dns.RcodeRefused)
// Do NOT sign rejection responses. Signing a refusal with
// the named key would attest the key exists on this server
// (Hamilton M9). Unsigned Refused is the right shape — the
// client sees "no TSIG on reply" and treats that as auth
// failure, which is correct because auth DID fail.
// Best-effort: if the request had TSIG and we recognize
// the key, the framework will sign the rejection so the
// client can authenticate "yes, server rejected this."
// Unknown keys silently skip signing — correct, since we
// can't prove identity to a peer we don't share a key with.
signResponseIfSigned(resp, r)
_ = w.WriteMsg(resp)
return dns.RcodeRefused, nil
}
// Hamilton M8: per-key rate limit. TSIG just authenticates the
// sender — it doesn't prove the sender's behavior is sane. A
// compromised key or a runaway client must not be able to
// exhaust disk/git/serial-counter resources.
if p.rateLimit != nil {
if tsig := r.IsTsig(); tsig != nil && !p.rateLimit.allow(strings.ToLower(tsig.Hdr.Name), time.Now()) {
log.Warningf("UPDATE rate-limited for key %q", tsig.Hdr.Name)
resp := new(dns.Msg)
resp.SetRcode(r, dns.RcodeRefused)
_ = w.WriteMsg(resp)
return dns.RcodeRefused, nil
}
}
return p.handleUpdate(w, r, true)
return p.handleUpdate(w, r)
}
return plugin.NextOrFailure(p.Name(), p.Next, ctx, w, r)
}

View File

@ -1,82 +0,0 @@
package rfc2136
import (
"sync"
"time"
)
// Per-key token bucket. Hamilton M8: a compromised TSIG key — or a
// misconfigured client retrying forever — must not be able to drive
// unbounded UPDATE traffic. Each UPDATE costs disk IOPS, a git commit,
// and a slot in the SOA serial counter (9999/day per zone). 100
// UPDATEs/minute per key is well above any legitimate ACME workflow
// (a full renewal storm across our ~84 zones might emit ~200 UPDATEs
// total over several minutes); anything beyond is suspicious.
const (
defaultRateBurst = 100 // max tokens
defaultRatePeriod = time.Minute // refill window
)
// rateLimiter is a goroutine-safe per-key token bucket. The zero value
// is unusable; construct via newRateLimiter.
type rateLimiter struct {
mu sync.Mutex
buckets map[string]*bucket
burst float64 // max tokens
period time.Duration // time to fully refill
}
type bucket struct {
tokens float64
lastRefill time.Time
}
func newRateLimiter(burst int, period time.Duration) *rateLimiter {
if burst <= 0 {
burst = defaultRateBurst
}
if period <= 0 {
period = defaultRatePeriod
}
return &rateLimiter{
buckets: make(map[string]*bucket),
burst: float64(burst),
period: period,
}
}
// allow attempts to take one token for `key`. Returns true if a token
// was available, false otherwise. New keys start full (burst tokens).
//
// Refill is continuous: tokens accumulate at burst/period per second.
// The bucket caps at burst tokens.
func (rl *rateLimiter) allow(key string, now time.Time) bool {
rl.mu.Lock()
defer rl.mu.Unlock()
b, ok := rl.buckets[key]
if !ok {
// First time we see this key — start the bucket full so
// legitimate clients don't see refill delays at boot.
rl.buckets[key] = &bucket{
tokens: rl.burst - 1,
lastRefill: now,
}
return true
}
// Refill: tokens earned since last access.
elapsed := now.Sub(b.lastRefill).Seconds()
earned := elapsed * (rl.burst / rl.period.Seconds())
b.tokens += earned
if b.tokens > rl.burst {
b.tokens = rl.burst
}
b.lastRefill = now
if b.tokens >= 1.0 {
b.tokens -= 1.0
return true
}
return false
}

View File

@ -1,83 +0,0 @@
package rfc2136
import (
"testing"
"time"
)
func TestRateLimiter_FirstCallAllowed(t *testing.T) {
rl := newRateLimiter(5, time.Minute)
now := time.Now()
if !rl.allow("key-a", now) {
t.Errorf("first call for new key must be allowed")
}
}
func TestRateLimiter_BurstExhausts(t *testing.T) {
rl := newRateLimiter(3, time.Minute)
now := time.Now()
// First 3 calls succeed.
for i := 0; i < 3; i++ {
if !rl.allow("key-a", now) {
t.Fatalf("call %d should be allowed (burst=3)", i+1)
}
}
// 4th immediately after burst should be denied (no time elapsed
// for refill).
if rl.allow("key-a", now) {
t.Errorf("4th call exceeded burst; should be denied")
}
}
func TestRateLimiter_RefillsOverTime(t *testing.T) {
// burst=2, period=1s → refill rate is 2 tokens/sec.
rl := newRateLimiter(2, time.Second)
t0 := time.Now()
if !rl.allow("k", t0) {
t.Fatal("call 1")
}
if !rl.allow("k", t0) {
t.Fatal("call 2")
}
if rl.allow("k", t0) {
t.Fatal("call 3 should be denied; bucket empty")
}
// Advance time by 500ms — should refill ~1 token.
if !rl.allow("k", t0.Add(500*time.Millisecond)) {
t.Errorf("expected refill after 500ms")
}
}
func TestRateLimiter_PerKeyIsolation(t *testing.T) {
rl := newRateLimiter(2, time.Minute)
now := time.Now()
// Exhaust key-a.
rl.allow("key-a", now)
rl.allow("key-a", now)
if rl.allow("key-a", now) {
t.Fatal("key-a still has tokens; setup wrong")
}
// key-b is independent — must still be allowed.
if !rl.allow("key-b", now) {
t.Errorf("key-b was rate-limited despite no prior use")
}
}
// TestRateLimiter_DoesNotOverflow guards against refill math
// accumulating beyond burst (which would let an attacker burst more
// after a long idle period than the configured cap).
func TestRateLimiter_DoesNotOverflow(t *testing.T) {
rl := newRateLimiter(5, time.Second)
t0 := time.Now()
rl.allow("k", t0) // create bucket
// Advance time 1 hour. Refill should cap at burst=5.
tFuture := t0.Add(time.Hour)
for i := 0; i < 5; i++ {
if !rl.allow("k", tFuture) {
t.Fatalf("post-idle call %d should be allowed (cap=5)", i+1)
}
}
if rl.allow("k", tFuture) {
t.Errorf("post-idle call 6 should be denied; cap exceeded")
}
}

108
setup.go
View File

@ -6,7 +6,6 @@ import (
"os"
"path/filepath"
"strconv"
"time"
"github.com/coredns/caddy"
"github.com/coredns/coredns/core/dnsserver"
@ -21,44 +20,17 @@ var log = clog.NewWithPlugin("rfc2136")
func init() {
plugin.Register("rfc2136", setup)
// SECURITY/OPERATIONAL NOTE — process-global mutation:
//
// miekg/dns's default MsgAcceptFunc rejects UPDATE opcode messages
// with NOTIMP at the wire layer, before any plugin sees them
// (acceptfunc.go: "Don't allow dynamic updates, because then the
// sections can contain a whole bunch of RRs"). CoreDNS 1.14.3
// constructs its dns.Server instances without exposing a
// per-server MsgAcceptFunc to plugins (see
// coredns/core/dnsserver/server.go:159 — the dns.Server struct is
// hardcoded), so to accept UPDATE opcodes anywhere we must
// override the package-level default. We do that here.
// with NOTIMP before they ever reach the plugin chain (see the
// comment in miekg/dns/acceptfunc.go: "Don't allow dynamic updates,
// because then the sections can contain a whole bunch of RRs").
//
// Consequences operators should understand:
//
// 1. The override is PROCESS-WIDE. Every CoreDNS server block in
// this binary will accept UPDATE opcodes at the wire layer,
// not just the one(s) where `rfc2136` is configured. Other
// blocks will pass the UPDATE through their plugin chains;
// since no plugin in those chains handles UPDATE, the request
// falls off the end of the chain and CoreDNS returns
// FormatError. No state is mutated — but the wire-layer
// gatekeeping moved into the plugin chain.
//
// 2. The actual security boundary is TSIG verification, which
// happens in this plugin's ServeDNS (checkTSIG) AND inside
// handleUpdate (assertAuthenticated). Defense in depth: any
// state-mutating path re-verifies, so a future refactor that
// adds a new caller cannot accidentally skip auth.
//
// 3. If you remove `rfc2136` from your Corefile and reload via
// SIGUSR1, this global is NOT restored. Restart the process
// to fully revert.
//
// The mitigation matrix is: (a) loud comment here, (b) startup
// INFO log listing the zones this plugin owns, (c) TSIG re-check
// inside handleUpdate. The architecturally clean fix would be
// upstream support in CoreDNS for per-Config MsgAcceptFunc — when
// that lands, delete this whole stanza.
// CoreDNS constructs its dns.Server instances without setting a
// per-server MsgAcceptFunc, so the package-level default is the
// one that runs. We override it here at plugin init() time -- well
// before any dns.Server starts listening -- to permit UPDATE
// through. The plugin itself does proper validation in the
// UPDATE handler, so opening this gate doesn't lower security.
dns.DefaultMsgAcceptFunc = msgAcceptFunc
}
@ -137,10 +109,6 @@ func setup(c *caddy.Controller) error {
log.Infof("ready: zones=%v keys=%d ttl=%d dir=%q auto-commit=%t",
p.Zones, len(p.TSIGKeys), p.TTL, p.ZonesDir, p.AutoCommit)
// Surface the global MsgAcceptFunc override for operator audit —
// if a sibling server block on this process doesn't expect UPDATE
// opcodes to traverse it, the operator should know they will.
log.Infof("dns.DefaultMsgAcceptFunc was overridden process-wide to permit OpcodeUpdate; state mutation requires TSIG verification on this plugin's zones=%v only", p.Zones)
return nil
}
@ -165,13 +133,6 @@ func parse(c *caddy.Controller) (*RFC2136, error) {
// Per-zone git author overrides. Defaults are applied later.
var gitAuthorName, gitAuthorEmail string
// Rate-limit config (Hamilton M8). Defaults are
// defaultRateBurst/defaultRatePeriod from ratelimit.go; an explicit
// `rate-limit <burst> <period-seconds>` directive overrides.
rateBurst := defaultRateBurst
ratePeriod := defaultRatePeriod
rateLimitEnabled := true
for c.Next() {
args := c.RemainingArgs()
if len(args) < 1 {
@ -243,37 +204,6 @@ func parse(c *caddy.Controller) (*RFC2136, error) {
gitAuthorName = gArgs[0]
gitAuthorEmail = gArgs[1]
case "notify":
nArgs := c.RemainingArgs()
if len(nArgs) < 1 {
return nil, c.Errf("notify requires at least one secondary (host or host:port)")
}
p.NotifyTargets = append(p.NotifyTargets, nArgs...)
case "rate-limit":
rArgs := c.RemainingArgs()
switch len(rArgs) {
case 1:
if rArgs[0] == "off" || rArgs[0] == "false" || rArgs[0] == "no" {
rateLimitEnabled = false
break
}
return nil, c.Errf("rate-limit single-arg form must be 'off'; for limits use 'rate-limit <burst> <period-seconds>'")
case 2:
b, err := strconv.ParseUint(rArgs[0], 10, 31)
if err != nil || b < 1 {
return nil, c.Errf("rate-limit burst must be positive integer, got %q", rArgs[0])
}
pSec, err := strconv.ParseUint(rArgs[1], 10, 31)
if err != nil || pSec < 1 {
return nil, c.Errf("rate-limit period must be positive integer seconds, got %q", rArgs[1])
}
rateBurst = int(b)
ratePeriod = time.Duration(pSec) * time.Second
default:
return nil, c.Errf("rate-limit takes 'off' OR '<burst> <period-seconds>', got %d args", len(rArgs))
}
default:
return nil, c.Errf("unknown directive: %s", c.Val())
}
@ -287,11 +217,6 @@ func parse(c *caddy.Controller) (*RFC2136, error) {
return nil, c.Err("zones-dir is required")
}
// Construct rate limiter if enabled.
if rateLimitEnabled {
p.rateLimit = newRateLimiter(rateBurst, ratePeriod)
}
// Build zoneFile handles for each declared zone.
p.zones = make(map[string]*zoneFile, len(p.Zones))
for _, z := range p.Zones {
@ -316,16 +241,8 @@ func parse(c *caddy.Controller) (*RFC2136, error) {
}
// validateZoneFiles ensures every configured zone has an accessible
// AND parseable file on disk at the expected path. Catches both typos
// (file missing) and corrupt zone content at CoreDNS startup rather
// than on the first UPDATE — the operator gets an immediate signal
// instead of discovering the breakage minutes later when ACME fires.
//
// Hamilton M4: the previous version only stat()'d the file. A zone
// with a syntax error sailed through startup, then the first UPDATE
// returned SERVFAIL with no startup-time signal. We now run the same
// loadRRs + assertSingleApexSOA path the UPDATE handler uses, so any
// parse-time or SOA-invariant failure surfaces at startup.
// file on disk at the expected path. Catches typos at CoreDNS startup
// rather than the first UPDATE.
func (p *RFC2136) validateZoneFiles() error {
for zone, zf := range p.zones {
st, err := os.Stat(zf.Path)
@ -335,9 +252,6 @@ func (p *RFC2136) validateZoneFiles() error {
if st.IsDir() {
return fmt.Errorf("zone %q: %s is a directory, expected a regular file", zone, zf.Path)
}
if _, _, err := zf.loadRRs(); err != nil {
return fmt.Errorf("zone %q at %s: %w", zone, zf.Path, err)
}
}
return nil
}

10
tsig.go
View File

@ -86,16 +86,6 @@ func (p *RFC2136) checkTSIG(w dns.ResponseWriter, r *dns.Msg) error {
// A nil from TsigStatus means verification succeeded; any non-nil
// error means the signature was invalid, the time was outside the
// fudge window, or some other auth failure.
//
// H6 — Replay window: the TSIG `fudge` field (RFC 8945 §5.2) is the
// allowed clock skew between client and server. miekg/dns enforces
// this in TsigVerify (rejecting with ErrTime if the request's time
// is too far from local time). We rely on miekg/dns's default
// tolerance (currently 300s); we do not configure it ourselves.
// If miekg/dns's default ever changes, our replay-window behavior
// changes with it — operators should monitor upstream release notes.
// A future enhancement is to make the fudge configurable via the
// Corefile (e.g., `tsig-fudge 300`).
if status := w.TsigStatus(); status != nil {
return fmt.Errorf("TSIG verification failed for key %q: %w", keyName, status)
}

132
update.go
View File

@ -2,7 +2,6 @@ package rfc2136
import (
"fmt"
"path/filepath"
"strings"
"time"
@ -27,35 +26,10 @@ import (
// Steps 3-7 happen under the zone-file mutex. If 8 fails we log but
// don't roll back (the on-disk state is authoritative; lost commits
// can be re-staged via `git add` later).
//
// SECURITY CONTRACT — the `verified` parameter:
//
// handleUpdate mutates zone files on disk. The caller MUST set
// verified=true only after successfully validating the message's TSIG
// signature against a configured key. ServeDNS does this. A
// verified=false invocation is treated as an unauthenticated attempt
// and refused — preserving the security boundary even if a future
// internal caller (NOTIFY relay, admin RPC, refactor) reaches this
// function without going through the wire-level TSIG check.
//
// Tests that exercise post-auth logic pass verified=true. Tests that
// exercise auth rejection pass verified=false.
//
// This is defense-in-depth: ServeDNS already verifies; we re-assert at
// the function boundary so the security property survives refactors.
func (p *RFC2136) handleUpdate(w dns.ResponseWriter, r *dns.Msg, verified bool) (int, error) {
func (p *RFC2136) handleUpdate(w dns.ResponseWriter, r *dns.Msg) (int, error) {
resp := new(dns.Msg)
resp.SetReply(r)
if verified {
// Only sign responses we authorize. Signing rejections leaks
// attestation that the named key exists on this server (see M9
// in the Hamilton review). Unauthorized callers get an
// unsigned Refused.
signResponseIfSigned(resp, r)
} else {
log.Warningf("handleUpdate refused: caller did not assert TSIG verification — possible internal bypass attempt")
return p.updateResp(w, resp, dns.RcodeRefused)
}
signResponseIfSigned(resp, r)
// 1. Validate the Zone section.
if len(r.Question) != 1 {
@ -81,8 +55,8 @@ func (p *RFC2136) handleUpdate(w dns.ResponseWriter, r *dns.Msg, verified bool)
zf.mu.Lock()
defer zf.mu.Unlock()
// 3. Load the current zone contents (with file-identity snapshot).
rrs, snap, err := zf.loadRRs()
// 3. Load the current zone contents.
rrs, err := zf.loadRRs()
if err != nil {
log.Errorf("UPDATE failed: %v", err)
return p.updateResp(w, resp, dns.RcodeServerFailure)
@ -115,28 +89,8 @@ func (p *RFC2136) handleUpdate(w dns.ResponseWriter, r *dns.Msg, verified bool)
if !changed {
// UPDATE was a valid no-op (e.g. only contained adds for RRs
// that were already present, deduped away per RFC 2136
// §3.4.2.2). Return NOERROR without rewriting the file or
// bumping the SOA serial.
//
// H7 — Policy decision documented:
//
// We DO NOT bump the SOA serial on no-op UPDATEs. Rationale:
// - DNS-wise, nothing changed. Forcing downstream secondaries
// (HE) to do an AXFR pull just to re-fetch identical content
// wastes bandwidth and is not what RFC 2136 implies.
// - The wire-visible cert-issuance chain for ACME does not
// depend on the second-UPDATE's serial bump — once the first
// UPDATE landed, the SOA already advanced and the auto plugin
// reloaded; subsequent identical UPDATEs are spurious and
// should be silent.
// - Caddy's caddy-dns/rfc2136 client treats NOERROR-no-bump as
// "yes I have your record" — which is the truthful answer.
//
// If a caller wants to force a serial bump for some reason, they
// can send a touch-UPDATE that adds-then-deletes a throwaway
// record. That's an explicit, intentional pattern and is
// supported.
// that were already present, deduped away). Return NOERROR
// without rewriting the file.
return p.updateResp(w, resp, dns.RcodeSuccess)
}
@ -147,45 +101,21 @@ func (p *RFC2136) handleUpdate(w dns.ResponseWriter, r *dns.Msg, verified bool)
return p.updateResp(w, resp, dns.RcodeServerFailure)
}
// 6b. Concurrent-modification check (Hamilton H1). Before clobbering
// the on-disk file, verify nothing changed it out from under us
// between loadRRs and now. The per-zone mutex serializes us against
// other in-process UPDATEs, but external editors (rsync push,
// manual edit, `git checkout`) can race in any time. If the file
// changed, refuse with SERVFAIL so Caddy retries on a fresh load.
if err := zf.checkUnchanged(snap); err != nil {
log.Warningf("UPDATE refused: %v", err)
return p.updateResp(w, resp, dns.RcodeServerFailure)
}
// 7. Atomic write.
if err := zf.writeAtomic(updated, now); err != nil {
log.Errorf("UPDATE write failed: %v", err)
return p.updateResp(w, resp, dns.RcodeServerFailure)
}
// 8. Auto-commit. Failure to commit means the file is correct but
// the git audit trail diverges (Hamilton H2). We log at ERROR with
// structured detail so operators discover the divergence; recovery
// is `git -C <zonesDir> status` + `git add` + manual commit. We do
// NOT roll back the file write — by the time the commit fails, the
// auto plugin may have already noticed the new mtime, and rolling
// back creates more races than it solves.
// 8. Auto-commit. Failure to commit isn't fatal to the UPDATE —
// the on-disk state is authoritative — but we log loudly.
msg := summarizeUpdate(zone, r.Ns)
if err := zf.commit(msg); err != nil {
log.Errorf("git auto-commit failed; zone file is correct but audit trail diverged: zone=%s path=%s err=%v — recover with `git -C %s status` + manual commit",
zone, zf.Path, err, filepath.Dir(zf.Path))
log.Warningf("git auto-commit failed: %v", err)
}
log.Infof("UPDATE applied: zone=%s prereqs=%d updates=%d msg=%q",
zone, len(r.Answer), len(r.Ns), msg)
// Fire NOTIFY to configured secondaries (RFC 1996). Non-blocking:
// each target gets its own goroutine, capped by notifyTimeout. The
// UPDATE response to the client is not held on these acks — RFC
// 1996 §4 explicitly decouples them.
sendNotify(zone, p.NotifyTargets)
return p.updateResp(w, resp, dns.RcodeSuccess)
}
@ -304,13 +234,8 @@ func applyUpdate(zone string, defaultTTL uint32, rrs []dns.RR, rr dns.RR) ([]dns
log.Debugf("apex %s add refused", dns.TypeToString[hdr.Rrtype])
return rrs, dns.RcodeRefused, false
}
// Hamilton L2: don't mutate the caller's RR header. miekg/dns
// parses the UPDATE message into RRs the caller still owns;
// silently rewriting their TTL is a hygiene smell. Copy first,
// then apply our default if needed.
if hdr.Ttl == 0 {
rr = dns.Copy(rr)
rr.Header().Ttl = defaultTTL
hdr.Ttl = defaultTTL
}
before := len(rrs)
rrs = addRRTo(rrs, rr)
@ -319,43 +244,12 @@ func applyUpdate(zone string, defaultTTL uint32, rrs []dns.RR, rr dns.RR) ([]dns
}
// summarizeUpdate produces a one-line commit message describing the
// UPDATE for git history. The output is sanitized — Hamilton M7 — to
// prevent attacker-controlled RR names (TSIG just authenticates the
// sender; the payload is still attacker-controlled) from injecting
// control characters into git log, log aggregators, or any downstream
// renderer that interprets ANSI/newlines.
// UPDATE for git history.
func summarizeUpdate(zone string, updates []dns.RR) string {
var msg string
if len(updates) == 1 {
msg = fmt.Sprintf("rfc2136 %s: %s", zone, oneLineOp(updates[0]))
} else {
msg = fmt.Sprintf("rfc2136 %s: %d operations", zone, len(updates))
return fmt.Sprintf("rfc2136 %s: %s", zone, oneLineOp(updates[0]))
}
return sanitizeForCommitMessage(msg)
}
// sanitizeForCommitMessage strips control characters from s, replacing
// them with their printable escape form. This keeps git log + downstream
// renderers safe from attacker-injected newlines, escape sequences, etc.
func sanitizeForCommitMessage(s string) string {
var b strings.Builder
b.Grow(len(s))
for _, r := range s {
switch {
case r == '\n':
b.WriteString("\\n")
case r == '\r':
b.WriteString("\\r")
case r == '\t':
b.WriteString("\\t")
case r < 0x20 || r == 0x7f:
// Other C0 controls + DEL: emit \xNN.
fmt.Fprintf(&b, "\\x%02x", r)
default:
b.WriteRune(r)
}
}
return b.String()
return fmt.Sprintf("rfc2136 %s: %d operations", zone, len(updates))
}
// oneLineOp returns a short human-readable description of a single

View File

@ -12,19 +12,14 @@ import (
// captureWriter implements dns.ResponseWriter and stashes the message
// passed to WriteMsg so tests can inspect it after handleUpdate returns.
//
// tsigErr, if non-nil, is what TsigStatus() returns — letting tests
// simulate TSIG-verification failure (bad MAC, fudge-window violation,
// unknown key etc.).
type captureWriter struct {
msg *dns.Msg
tsigErr error
msg *dns.Msg
}
func (cw *captureWriter) WriteMsg(m *dns.Msg) error { cw.msg = m; return nil }
func (cw *captureWriter) Write([]byte) (int, error) { return 0, nil }
func (cw *captureWriter) Close() error { return nil }
func (cw *captureWriter) TsigStatus() error { return cw.tsigErr }
func (cw *captureWriter) TsigStatus() error { return nil }
func (cw *captureWriter) TsigTimersOnly(bool) {}
func (cw *captureWriter) Hijack() {}
func (cw *captureWriter) LocalAddr() net.Addr { return &net.IPAddr{IP: net.IPv4(127, 0, 0, 1)} }
@ -60,14 +55,11 @@ func newUpdate(zone string, updates ...dns.RR) *dns.Msg {
}
// runUpdate sends msg through the handler with TSIG auth bypassed
// (calling handleUpdate directly with verified=true instead of
// ServeDNS). Tests that exercise post-auth logic use this; tests that
// assert auth-failure behavior should call handleUpdate(w, msg, false)
// directly.
// (calling handleUpdate directly instead of ServeDNS).
func runUpdate(t *testing.T, p *RFC2136, msg *dns.Msg) (rcode int) {
t.Helper()
w := &captureWriter{}
rcode, _ = p.handleUpdate(w, msg, true)
rcode, _ = p.handleUpdate(w, msg)
return rcode
}
@ -75,7 +67,7 @@ func runUpdate(t *testing.T, p *RFC2136, msg *dns.Msg) (rcode int) {
func readZoneRecords(t *testing.T, p *RFC2136, zone string) []dns.RR {
t.Helper()
zf := p.zones[dns.Fqdn(zone)]
rrs, _, err := zf.loadRRs()
rrs, err := zf.loadRRs()
if err != nil {
t.Fatalf("loadRRs: %v", err)
}
@ -160,103 +152,6 @@ func TestUpdate_DeleteRRset_RemovesAllOfType(t *testing.T) {
}
}
// TestCheckTSIG_BadStatus_Refused covers H6 — the path where miekg/dns
// has reported a TSIG verification failure (bad MAC, fudge-window
// violation, expired timestamp, etc.) via dns.ResponseWriter.TsigStatus.
// checkTSIG must surface this as an error so ServeDNS refuses the
// UPDATE. This is the test that would catch a regression in our
// reliance on miekg/dns's default fudge window.
func TestCheckTSIG_BadStatus_Refused(t *testing.T) {
p := newTestPluginWithZone(t, "auth.example.com")
// Configure a key so checkTSIG doesn't short-circuit on "no keys".
p.TSIGKeys = map[string]tsigKey{
"acme-update-key.": {Algorithm: dns.HmacSHA256, Secret: []byte("aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa")},
}
upd := newUpdate("auth.example.com",
mustRR(t, `foo.auth.example.com. 60 IN TXT "x"`),
)
upd.SetTsig("acme-update-key.", dns.HmacSHA256, 300, 0)
w := &captureWriter{
// Simulate miekg/dns reporting "TSIG outside fudge window."
tsigErr: dns.ErrTime,
}
if err := p.checkTSIG(w, upd); err == nil {
t.Errorf("checkTSIG accepted request despite TsigStatus reporting %v", dns.ErrTime)
}
}
// TestUpdate_UnverifiedCaller_Refused proves the C2 defense-in-depth
// contract: handleUpdate refuses any call that doesn't assert TSIG
// verification, even if the rest of the message is well-formed and the
// zone is authoritative. This guards against a future internal caller
// that bypasses the wire-level TSIG check in ServeDNS.
func TestUpdate_UnverifiedCaller_Refused(t *testing.T) {
p := newTestPluginWithZone(t, "auth.example.com")
upd := newUpdate("auth.example.com",
mustRR(t, `foo.auth.example.com. 60 IN TXT "should-not-apply"`),
)
w := &captureWriter{}
rcode, _ := p.handleUpdate(w, upd, false) // <- the security boundary
if rcode != dns.RcodeRefused {
t.Errorf("rcode = %d, want REFUSED for unverified caller", rcode)
}
// Zone file must NOT have been modified.
for _, rr := range readZoneRecords(t, p, "auth.example.com") {
if txt, ok := rr.(*dns.TXT); ok && txt.Hdr.Name == "foo.auth.example.com." {
t.Errorf("unverified UPDATE applied state mutation: %s", rr.String())
}
}
}
// TestUpdate_DeleteRR_IgnoresTTL covers M3: an UPDATE's delete-RR
// specifies a different TTL than the stored RR, but per RFC 2136
// §3.4.2.4 the deletion must match by owner/class/type/rdata only.
// The old String()-based comparison would silently fail to match.
func TestUpdate_DeleteRR_IgnoresTTL(t *testing.T) {
p := newTestPluginWithZone(t, "auth.example.com")
// Add a TXT with TTL 60.
runUpdate(t, p, newUpdate("auth.example.com",
mustRR(t, `foo.auth.example.com. 60 IN TXT "match-me"`),
))
// Issue a CLASS=NONE delete with a different TTL (per RFC 2136
// §2.5.4, the TTL on a CLASS=NONE delete is supposed to be 0; some
// clients get this wrong, and even when they get it right, an
// implementation must ignore the value).
delRR := mustRR(t, `foo.auth.example.com. 0 IN TXT "match-me"`)
delRR.Header().Class = dns.ClassNONE
if rcode := runUpdate(t, p, newUpdate("auth.example.com", delRR)); rcode != dns.RcodeSuccess {
t.Fatalf("rcode = %d, want NOERROR", rcode)
}
for _, rr := range readZoneRecords(t, p, "auth.example.com") {
if txt, ok := rr.(*dns.TXT); ok && txt.Hdr.Name == "foo.auth.example.com." && txt.Txt[0] == "match-me" {
t.Errorf("TTL-differing delete did not remove the RR: %s", rr.String())
}
}
}
// TestSanitizeForCommitMessage covers M7: attacker-controlled RR names
// (TSIG only authenticates the sender; the payload is hostile by
// default) must not inject control characters into commit messages or
// downstream log renderers.
func TestSanitizeForCommitMessage(t *testing.T) {
cases := []struct {
in, want string
}{
{"plain", "plain"},
{"with\nnewline", "with\\nnewline"},
{"tab\there", "tab\\there"},
{"esc\x1b[31mred\x1b[0m", "esc\\x1b[31mred\\x1b[0m"},
{"del\x7fchar", "del\\x7fchar"},
}
for _, tc := range cases {
got := sanitizeForCommitMessage(tc.in)
if got != tc.want {
t.Errorf("sanitize(%q) = %q, want %q", tc.in, got, tc.want)
}
}
}
func TestUpdate_OutOfZone_Refused(t *testing.T) {
p := newTestPluginWithZone(t, "auth.example.com")
@ -384,7 +279,7 @@ func TestUpdate_SignedRequest_ResponseGetsTSIG(t *testing.T) {
upd.SetTsig("acme-update-key.", dns.HmacSHA256, 300, 0)
w := &captureWriter{}
if _, err := p.handleUpdate(w, upd, true); err != nil {
if _, err := p.handleUpdate(w, upd); err != nil {
t.Fatalf("handleUpdate: %v", err)
}
if w.msg == nil {
@ -414,7 +309,7 @@ func TestUpdate_UnsignedRequest_ResponseStaysUnsigned(t *testing.T) {
// No SetTsig — request is plain.
w := &captureWriter{}
if _, err := p.handleUpdate(w, upd, true); err != nil {
if _, err := p.handleUpdate(w, upd); err != nil {
t.Fatalf("handleUpdate: %v", err)
}
if w.msg.IsTsig() != nil {

View File

@ -1,13 +1,10 @@
package rfc2136
import (
"context"
"fmt"
"math"
"os"
"os/exec"
"path/filepath"
"strconv"
"strings"
"sync"
"time"
@ -15,31 +12,6 @@ import (
"github.com/miekg/dns"
)
// gitCommandTimeout caps any single git invocation we shell out to. If
// the process hangs (NFS stall, gpg-sign prompt, broken pre-commit hook
// waiting on stdin, etc.) we must not block the caller — which holds
// the per-zone mutex — forever. 10s is generous for a local-disk repo.
const gitCommandTimeout = 10 * time.Second
// fileSnapshot captures the file-identity fingerprint at the moment we
// last read the zone file. We compare it to the live stat just before
// writing back to detect concurrent modification (rsync push, manual
// edit, `git checkout`). If anything changed the file out from under
// us, we refuse the UPDATE — Caddy will retry, and the next loadRRs
// will see the updated state.
type fileSnapshot struct {
mtime time.Time
size int64
}
// matches returns true if `info` reports the same mtime and size we
// captured. Inode comparison would be stricter, but mtime+size is
// sufficient for the failure modes we care about (rename-over-original
// edits change both).
func (s fileSnapshot) matches(info os.FileInfo) bool {
return s.mtime.Equal(info.ModTime()) && s.size == info.Size()
}
// zoneFile is a file-backed authority for a single DNS zone. Replaces
// the Phase-1.3 in-memory recordStore.
//
@ -102,32 +74,13 @@ func openZoneFile(path, origin string) *zoneFile {
// loadRRs reads the zone file and parses it into an RR slice via
// miekg/dns's zone parser. The parser handles $ORIGIN, $TTL, multi-line
// SOA, comments, includes, etc.
//
// Returns (rrs, snapshot, error). The snapshot fingerprints the file
// identity at read time so a subsequent writeIfUnchanged can detect
// concurrent modification.
//
// Hamilton H4 — strict-parse validation: a single malformed line could
// otherwise produce a partial parse where parser.Err() returns nil but
// some records silently went missing. To catch this, we enforce a
// post-parse invariant: exactly one SOA RR, and that SOA's name equals
// the configured zone origin. A zone file that's been partially eaten
// by the parser usually loses its SOA along the way — checking SOA
// presence catches both H4 (silent truncation) and H3 (multi-SOA or
// wrong-apex SOA) with a single guard.
func (z *zoneFile) loadRRs() ([]dns.RR, fileSnapshot, error) {
func (z *zoneFile) loadRRs() ([]dns.RR, error) {
f, err := os.Open(z.Path)
if err != nil {
return nil, fileSnapshot{}, fmt.Errorf("open %s: %w", z.Path, err)
return nil, fmt.Errorf("open %s: %w", z.Path, err)
}
defer f.Close()
info, err := f.Stat()
if err != nil {
return nil, fileSnapshot{}, fmt.Errorf("stat %s: %w", z.Path, err)
}
snap := fileSnapshot{mtime: info.ModTime(), size: info.Size()}
parser := dns.NewZoneParser(f, z.Origin, z.Path)
parser.SetDefaultTTL(3600)
@ -136,65 +89,12 @@ func (z *zoneFile) loadRRs() ([]dns.RR, fileSnapshot, error) {
rrs = append(rrs, rr)
}
if err := parser.Err(); err != nil {
return nil, snap, fmt.Errorf("parse %s: %w", z.Path, err)
return nil, fmt.Errorf("parse %s: %w", z.Path, err)
}
if len(rrs) == 0 {
return nil, snap, fmt.Errorf("%s: zero RRs parsed", z.Path)
return nil, fmt.Errorf("%s: zero RRs parsed", z.Path)
}
// H3/H4 invariant: exactly one SOA, anchored at the zone origin.
// Refuse to operate on a zone file whose SOA structure is wrong —
// any subsequent bumpSerial or write would compound the damage.
if err := assertSingleApexSOA(rrs, z.Origin); err != nil {
return nil, snap, fmt.Errorf("zone %s integrity check failed: %w", z.Path, err)
}
return rrs, snap, nil
}
// assertSingleApexSOA enforces that rrs contains exactly one SOA and
// that its owner matches the zone origin. Returns an error otherwise.
// This is the H3+H4 zone-integrity invariant.
func assertSingleApexSOA(rrs []dns.RR, origin string) error {
origin = canon(origin)
var soas []*dns.SOA
for _, rr := range rrs {
if s, ok := rr.(*dns.SOA); ok {
soas = append(soas, s)
}
}
switch len(soas) {
case 0:
return fmt.Errorf("no SOA record found (expected one at %q)", origin)
case 1:
if canon(soas[0].Hdr.Name) != origin {
return fmt.Errorf("SOA owner is %q, expected zone apex %q", soas[0].Hdr.Name, origin)
}
return nil
default:
names := make([]string, len(soas))
for i, s := range soas {
names[i] = s.Hdr.Name
}
return fmt.Errorf("multiple SOA records found (%d): %s", len(soas), strings.Join(names, ", "))
}
}
// checkUnchanged returns nil if the on-disk file still matches the
// captured snapshot. If the file has been modified (mtime or size
// differs), returns an error — the caller should refuse the UPDATE
// rather than clobber the external change.
func (z *zoneFile) checkUnchanged(snap fileSnapshot) error {
info, err := os.Stat(z.Path)
if err != nil {
return fmt.Errorf("stat %s: %w", z.Path, err)
}
if !snap.matches(info) {
return fmt.Errorf("concurrent modification detected on %s: mtime %s/%s size %d/%d",
z.Path, snap.mtime.Format(time.RFC3339Nano), info.ModTime().Format(time.RFC3339Nano),
snap.size, info.Size())
}
return nil
return rrs, nil
}
// Lookup returns RRs in `rrs` matching (name, rtype). Both name and
@ -252,19 +152,13 @@ func removeNameFrom(rrs []dns.RR, name string) []dns.RR {
}
// removeRRFrom returns rrs minus the single RR matching the given one
// by owner + type + rdata.
//
// Hamilton M3: per RFC 2136 §2.5.4, a delete-by-exact-match UPDATE
// carries CLASS=NONE and TTL=0 as protocol flags, not as match
// criteria. The target must match a stored RR by owner+type+rdata
// alone. We normalize both sides to the same class + TTL before
// invoking dns.IsDuplicate so the comparison is correct.
// by owner + type + rdata. String() comparison covers rdata exactness.
func removeRRFrom(rrs []dns.RR, target dns.RR) []dns.RR {
targetN := normalizeForCompare(target)
targetStr := target.String()
out := rrs[:0:0]
matched := false
for _, rr := range rrs {
if !matched && dns.IsDuplicate(normalizeForCompare(rr), targetN) {
if !matched && rr.String() == targetStr {
matched = true
continue
}
@ -274,30 +168,17 @@ func removeRRFrom(rrs []dns.RR, target dns.RR) []dns.RR {
}
// addRRTo appends rr to rrs unless an identical RR already exists
// (de-dupe semantics per RFC 2136 §3.4.2.2). Same normalization as
// removeRRFrom — dedupe is TTL- and class-insensitive in the comparison
// (though the stored RR retains its original TTL/class).
// (de-dupe semantics per RFC 2136 §3.4.2.2).
func addRRTo(rrs []dns.RR, rr dns.RR) []dns.RR {
rrN := normalizeForCompare(rr)
target := rr.String()
for _, existing := range rrs {
if dns.IsDuplicate(normalizeForCompare(existing), rrN) {
if existing.String() == target {
return rrs
}
}
return append(rrs, rr)
}
// normalizeForCompare returns a copy of rr with TTL=0 and class=IN so
// dns.IsDuplicate can be used to compare by (owner, type, rdata) alone.
// Required by RFC 2136 §2.5.4's "TTL and CLASS are flags, not match
// criteria" semantics.
func normalizeForCompare(rr dns.RR) dns.RR {
n := dns.Copy(rr)
n.Header().Ttl = 0
n.Header().Class = dns.ClassINET
return n
}
// serialCounterMul is the multiplier between the date prefix and the
// counter in our SOA-serial encoding. The format is YYMMDD*10000 + NNNN,
// giving 10000 bumps/day (NNNN ∈ [0001, 9999]). The 2-digit year keeps
@ -347,9 +228,9 @@ func bumpSerial(rrs []dns.RR, now time.Time) error {
// serial that happens to look like a valid YYMMDD prefix but is in
// the past, which is the same handling: jump to today).
if curDate := cur[:6]; isValidYYMMDD(curDate) && curDate >= today {
nnnn := int(mustParseUint(cur[6:10]))
nnnn := atoi(cur[6:10])
if nnnn < 9999 {
soa.Serial = uint32(mustParseUint(curDate)*serialCounterMul + uint64(nnnn+1))
soa.Serial = uint32(parseUint(curDate)*serialCounterMul + uint64(nnnn+1))
return nil
}
// NNNN=9999: roll to next encoded day, NNNN=0001.
@ -358,31 +239,17 @@ func bumpSerial(rrs []dns.RR, now time.Time) error {
return fmt.Errorf("serial date %q unparseable: %w", curDate, err)
}
next := d.AddDate(0, 0, 1).Format("060102")
soa.Serial = uint32(mustParseUint(next)*serialCounterMul + 1)
soa.Serial = uint32(parseUint(next)*serialCounterMul + 1)
return nil
}
// Older or unparseable: jump to today*10000+1. Migration path for
// legacy YYYYMMDDNN serials lives here.
candidate := uint32(mustParseUint(today)*serialCounterMul + 1)
// H5 — explicit MaxUint32 guard. Plain `>` comparison is correct in
// practice (we'd never wrap during the zone's lifetime: 10000
// bumps/day × 365 days × ~117 years = ~427M, well under 2^32). The
// real failure mode we must prevent is wrap-to-0: if soa.Serial
// somehow reached MaxUint32 (hand-edit, fuzz, or a future code path
// we haven't written), `soa.Serial++` would wrap to 0, and
// downstream secondaries per RFC 1982 treat 0-after-MaxUint32 as
// "older" — they refuse to AXFR, and the zone goes dark. Loud
// refusal forces the operator to manually reset the serial,
// instead of silently bricking the zone.
candidate := uint32(parseUint(today)*serialCounterMul + 1)
if candidate <= soa.Serial {
if soa.Serial == math.MaxUint32 {
return fmt.Errorf("SOA serial at uint32 max (%d) — refusing to wrap to 0; operator must reset zone serial manually (see RFC 1982 §3.2)", soa.Serial)
}
// Defensive monotonic advance for the unusual "current serial
// is already > today's new-format minimum" case (e.g., a
// hand-edit set it to a far-future value).
// Defensive: don't regress. If something has somehow
// provisioned a serial >= today's new-format candidate (e.g.,
// far-future serial from a hand-edit), just +1 to advance.
soa.Serial++
return nil
}
@ -400,20 +267,23 @@ func isValidYYMMDD(s string) bool {
return err == nil
}
// mustParseUint parses an all-digit string into uint64. Panics on
// malformed input — the caller is responsible for passing only strings
// that were validated as digit-substrings (e.g., a fixed-width slice of
// a YYMMDDNNNN-formatted serial). Using strconv via this thin wrapper
// keeps the panic behavior explicit while sharing stdlib's robust
// parsing (Hamilton L1).
func mustParseUint(s string) uint64 {
n, err := strconv.ParseUint(s, 10, 64)
if err != nil {
// Programmer error if we ever hit this — every caller passes
// digits-only strings derived from time.Format or a sliced
// SOA serial. Panic so the bug surfaces in tests/CI rather
// than silently producing a 0 serial.
panic(fmt.Sprintf("mustParseUint(%q): %v", s, err))
// atoi is a tiny helper that ignores errors — only called on a
// substring we already validated is two digits.
func atoi(s string) int {
n := 0
for _, c := range s {
n = n*10 + int(c-'0')
}
return n
}
// parseUint parses an all-digits string into a uint64. Used because
// strconv.ParseUint adds error-handling overhead we don't need on
// internally-controlled inputs.
func parseUint(s string) uint64 {
var n uint64
for _, c := range s {
n = n*10 + uint64(c-'0')
}
return n
}
@ -475,12 +345,6 @@ func (z *zoneFile) writeAtomic(rrs []dns.RR, now time.Time) error {
// repository directory inferred from the zone file's parent. Returns
// nil silently if AutoCommit is false. Returns an error if the commit
// fails; the caller decides whether to roll back the file write.
//
// Both git invocations run under a context with a hard timeout
// (gitCommandTimeout). If git hangs (NFS stall, gpg-sign prompt,
// pre-commit hook waiting on stdin), we kill it rather than block the
// caller's per-zone mutex indefinitely. ACME storms must not be able
// to wedge the plugin via git getting stuck.
func (z *zoneFile) commit(message string) error {
if !z.AutoCommit {
return nil
@ -488,17 +352,15 @@ func (z *zoneFile) commit(message string) error {
// We run git from the directory containing the zone file. git will
// walk upward to find the .git dir.
dir := filepath.Dir(z.Path)
ctx, cancel := context.WithTimeout(context.Background(), gitCommandTimeout)
defer cancel()
// `git add` first; if file is already in the index, no harm done.
add := exec.CommandContext(ctx, "git",
add := exec.Command("git",
"-C", dir,
"add", "--", z.Path,
)
if out, err := add.CombinedOutput(); err != nil {
return fmt.Errorf("git add failed: %w: %s", err, strings.TrimSpace(string(out)))
}
commit := exec.CommandContext(ctx, "git",
commit := exec.Command("git",
"-C", dir,
"-c", "user.name="+z.GitAuthorName,
"-c", "user.email="+z.GitAuthorEmail,

View File

@ -1,7 +1,6 @@
package rfc2136
import (
"math"
"os"
"path/filepath"
"strings"
@ -26,7 +25,7 @@ func TestZoneFile_LoadRRs(t *testing.T) {
dir := withTempZonesDir(t, "auth.example.com")
zf := openZoneFile(filepath.Join(dir, "auth.example.com.zone"), "auth.example.com.")
rrs, _, err := zf.loadRRs()
rrs, err := zf.loadRRs()
if err != nil {
t.Fatalf("loadRRs: %v", err)
}
@ -41,7 +40,7 @@ func TestZoneFile_LoadRRs(t *testing.T) {
func TestZoneFile_LoadRRs_MissingFile(t *testing.T) {
zf := openZoneFile("/nope/missing.zone", "missing.")
if _, _, err := zf.loadRRs(); err == nil {
if _, err := zf.loadRRs(); err == nil {
t.Errorf("expected error loading missing file, got nil")
}
}
@ -221,80 +220,6 @@ func TestBumpSerial_NonCalVerFormat_ResetsToToday(t *testing.T) {
}
}
// TestBumpSerial_MaxUint32_RefusesWrap covers H5: the defensive
// branch must not wrap soa.Serial to 0. Wrap-to-0 makes downstream
// secondaries treat the zone as reset per RFC 1982 and refuse to AXFR
// the new value, taking the zone dark.
func TestBumpSerial_MaxUint32_RefusesWrap(t *testing.T) {
rrs := []dns.RR{
&dns.SOA{Hdr: dns.RR_Header{Name: "example.com.", Rrtype: dns.TypeSOA},
Serial: math.MaxUint32},
}
now := time.Date(2026, 5, 22, 1, 0, 0, 0, time.UTC)
if err := bumpSerial(rrs, now); err == nil {
t.Errorf("bumpSerial accepted MaxUint32; expected refusal to prevent wrap-to-0")
}
if got := rrs[0].(*dns.SOA).Serial; got == 0 {
t.Errorf("Serial wrapped to 0 (catastrophic): %d", got)
}
}
// TestLoadRRs_NoSOA_Refused covers H4: a zone file that's missing its
// SOA (because the parser ate it on a malformed prior line, or because
// it was never there) must fail loadRRs rather than be silently treated
// as a valid empty-of-SOA zone.
func TestLoadRRs_NoSOA_Refused(t *testing.T) {
dir := t.TempDir()
path := filepath.Join(dir, "noSoA.example.com.zone")
content := "$ORIGIN noSoA.example.com.\nfoo.noSoA.example.com. 60 IN A 192.0.2.1\n"
if err := os.WriteFile(path, []byte(content), 0644); err != nil {
t.Fatalf("write fixture: %v", err)
}
zf := openZoneFile(path, "noSoA.example.com.")
if _, _, err := zf.loadRRs(); err == nil {
t.Errorf("loadRRs accepted SOA-less zone file; expected refusal")
}
}
// TestLoadRRs_MultipleSOAs_Refused covers H3: two SOAs at the apex
// produce inconsistent zone state visible to AXFR clients. We refuse
// to operate on such a file rather than bumping only the first SOA.
func TestLoadRRs_MultipleSOAs_Refused(t *testing.T) {
dir := t.TempDir()
path := filepath.Join(dir, "dupe.example.com.zone")
content := `$ORIGIN dupe.example.com.
dupe.example.com. 3600 IN SOA ns.example. admin.example. 1 60 60 60 60
dupe.example.com. 3600 IN SOA ns.example. admin.example. 2 60 60 60 60
dupe.example.com. 3600 IN NS ns.example.
`
if err := os.WriteFile(path, []byte(content), 0644); err != nil {
t.Fatalf("write fixture: %v", err)
}
zf := openZoneFile(path, "dupe.example.com.")
if _, _, err := zf.loadRRs(); err == nil {
t.Errorf("loadRRs accepted dual-SOA zone; expected refusal")
}
}
// TestLoadRRs_NonApexSOA_Refused covers H3: an SOA owned by a name
// other than the zone apex is a parse-error red flag and must be
// refused, not silently bumped.
func TestLoadRRs_NonApexSOA_Refused(t *testing.T) {
dir := t.TempDir()
path := filepath.Join(dir, "weird.example.com.zone")
content := `$ORIGIN weird.example.com.
sub.weird.example.com. 3600 IN SOA ns.example. admin.example. 1 60 60 60 60
weird.example.com. 3600 IN NS ns.example.
`
if err := os.WriteFile(path, []byte(content), 0644); err != nil {
t.Fatalf("write fixture: %v", err)
}
zf := openZoneFile(path, "weird.example.com.")
if _, _, err := zf.loadRRs(); err == nil {
t.Errorf("loadRRs accepted non-apex SOA; expected refusal")
}
}
func TestBumpSerial_NoSOA_ReturnsError(t *testing.T) {
rrs := []dns.RR{mustRR(t, `foo.example.com. 60 IN A 192.0.2.1`)}
now := time.Now()
@ -309,7 +234,7 @@ func TestZoneFile_WriteAtomic_RoundTrip(t *testing.T) {
zf := openZoneFile(path, "auth.example.com.")
zf.AutoCommit = false // not testing commit here
rrs, _, err := zf.loadRRs()
rrs, err := zf.loadRRs()
if err != nil {
t.Fatalf("initial load: %v", err)
}
@ -322,7 +247,7 @@ func TestZoneFile_WriteAtomic_RoundTrip(t *testing.T) {
}
// Re-load and verify.
after, _, err := zf.loadRRs()
after, err := zf.loadRRs()
if err != nil {
t.Fatalf("re-load: %v", err)
}
@ -347,7 +272,7 @@ func TestZoneFile_WriteAtomic_LeavesNoTempFile(t *testing.T) {
zf := openZoneFile(path, "auth.example.com.")
zf.AutoCommit = false
rrs, _, _ := zf.loadRRs()
rrs, _ := zf.loadRRs()
_ = zf.writeAtomic(rrs, time.Now())
// No .rfc2136-*.zone temp files should remain.
@ -357,46 +282,13 @@ func TestZoneFile_WriteAtomic_LeavesNoTempFile(t *testing.T) {
}
}
// TestZoneFile_CheckUnchanged_DetectsExternalModification covers H1:
// between loadRRs and the next writeAtomic, if anything (rsync, manual
// edit, git checkout) clobbered the file, checkUnchanged returns an
// error so the caller refuses the UPDATE instead of losing the
// external write.
func TestZoneFile_CheckUnchanged_DetectsExternalModification(t *testing.T) {
dir := withTempZonesDir(t, "auth.example.com")
path := filepath.Join(dir, "auth.example.com.zone")
zf := openZoneFile(path, "auth.example.com.")
zf.AutoCommit = false
_, snap, err := zf.loadRRs()
if err != nil {
t.Fatalf("loadRRs: %v", err)
}
// Pristine snapshot — checkUnchanged should pass.
if err := zf.checkUnchanged(snap); err != nil {
t.Errorf("pristine snapshot rejected: %v", err)
}
// Simulate an external editor clobbering the file (rsync-style:
// new content, different size, mtime advances).
time.Sleep(10 * time.Millisecond) // ensure mtime differs
if err := os.WriteFile(path, []byte("; external edit\n$ORIGIN auth.example.com.\nauth.example.com. 3600 IN SOA ns.example. admin.example. 1 60 60 60 60\n"), 0644); err != nil {
t.Fatalf("simulated external edit: %v", err)
}
if err := zf.checkUnchanged(snap); err == nil {
t.Errorf("checkUnchanged accepted modified file; expected concurrent-modification error")
}
}
func TestZoneFile_WriteAtomic_FileEndsWithNewline(t *testing.T) {
dir := withTempZonesDir(t, "auth.example.com")
path := filepath.Join(dir, "auth.example.com.zone")
zf := openZoneFile(path, "auth.example.com.")
zf.AutoCommit = false
rrs, _, _ := zf.loadRRs()
rrs, _ := zf.loadRRs()
_ = zf.writeAtomic(rrs, time.Now())
data, _ := os.ReadFile(path)