L1 — Replace hand-rolled atoi/parseUint with strconv.ParseUint wrapped
in mustParseUint. Hamilton's reasoning: the comment "strconv adds
overhead we don't need" is the Lauren-Bug shape — we already validated
the input. Until we hadn't, on a path we couldn't predict. Stdlib's
edge-case coverage is the safer default; the wrapper panics on
malformed input so any future regression surfaces in CI, not as a
silent 0 serial.
L2 — applyUpdate no longer mutates the caller's RR header TTL. miekg/
dns parses the UPDATE message into RRs the caller still owns; silently
rewriting hdr.Ttl was a hygiene smell with no current functional
consequence but a clear documentation issue. Now we dns.Copy() the RR
before any header mutation.
L4 — README expanded with an "Operational constraints" section
documenting the contracts and limits operators should understand
before relying on this in production:
- Single-process atomicity only (with rsync-race mitigation)
- Process-global MsgAcceptFunc override
- No-op UPDATE doesn't bump SOA (with touch-UPDATE workaround)
- SOA invariants enforced strictly (zero, multi, non-apex SOA all
refused)
- Serial counter NNNN=9999 rollover semantics
- TSIG replay window dependency on miekg/dns default
- Git commit failure logged at ERROR, not rolled back
- Per-key rate limit knobs
Every constraint maps to a Hamilton review finding; documenting the
contract in operator-facing prose closes the gap between code and
expectation that the review identified.
166 lines
6.1 KiB
Markdown
166 lines
6.1 KiB
Markdown
# coredns-rfc2136
|
|
|
|
A [CoreDNS](https://coredns.io) plugin that accepts **RFC 2136 dynamic DNS
|
|
updates** (TSIG-authenticated), filling a gap in the official plugin set.
|
|
|
|
CoreDNS as-shipped has no plugin for accepting dynamic updates — its
|
|
plugin model treats authoritative data as read-only (loaded from `auto`,
|
|
`file`, `secondary`, etc.). This plugin adds the missing piece.
|
|
|
|
## Primary use case: self-hosted ACME DNS-01
|
|
|
|
The motivating problem: automate Let's Encrypt cert issuance for many
|
|
domains without depending on registrar APIs (Vultr/Route53/Cloudflare).
|
|
The architecture:
|
|
|
|
```
|
|
_acme-challenge.example.com CNAME <uuid>.auth.supported.systems
|
|
│
|
|
│ delegated NS to your CoreDNS host
|
|
▼
|
|
CoreDNS + rfc2136 plugin
|
|
│
|
|
│ accepts TSIG UPDATEs from Caddy
|
|
│ (caddy-dns/rfc2136) or any other
|
|
│ ACME client
|
|
▼
|
|
Let's Encrypt validates
|
|
```
|
|
|
|
One-time per protected domain: add a `CNAME` glue line in your static
|
|
zones. After that, all cert issuance + renewal happens via UPDATE
|
|
messages — zero static zone-file churn.
|
|
|
|
## Status
|
|
|
|
**v2026.05.22.2**: production-ready. Handles UPDATE messages against
|
|
file-backed zones, TSIG-authenticates, bumps SOA serial in CalVer
|
|
YYMMDD*10000+NNNN form, atomically writes the zone file, optionally
|
|
git-commits each change for an audit trail. Designed to coexist with
|
|
CoreDNS's `auto` plugin (which serves queries from the same zone files
|
|
on its reload cycle).
|
|
|
|
## Configuration
|
|
|
|
```
|
|
rfc2136 <zone> [<zone>...] {
|
|
zones-dir <path> # required
|
|
tsig-key <name> <algorithm> <base64-secret> # may repeat
|
|
ttl <seconds> # default 60
|
|
auto-commit <true|false> # default true
|
|
git-author <name> <email> # optional
|
|
rate-limit <burst> <period-seconds> # default 100 / 60s
|
|
rate-limit off # disable rate-limit
|
|
}
|
|
```
|
|
|
|
Example:
|
|
|
|
```
|
|
.:53 auth.example.com {
|
|
rfc2136 auth.example.com {
|
|
zones-dir /var/lib/coredns/zones
|
|
tsig-key acme-key. hmac-sha256 BASE64SECRET==
|
|
ttl 60
|
|
auto-commit true
|
|
git-author "coredns-rfc2136" "rfc2136@coredns.example.com"
|
|
}
|
|
auto {
|
|
directory /var/lib/coredns/zones (.*)\.zone {1}
|
|
reload 30s
|
|
}
|
|
errors
|
|
log
|
|
}
|
|
```
|
|
|
|
## Operational constraints
|
|
|
|
A few behaviors operators should know before relying on this plugin:
|
|
|
|
### Single-process atomicity only
|
|
|
|
The per-zone mutex serializes UPDATEs *within one CoreDNS process*. It
|
|
does NOT coordinate with external file edits. If you `rsync` a zone
|
|
file from a workstation while the plugin is mid-UPDATE, you get a
|
|
race. The plugin defends against this with a snapshot-and-recheck:
|
|
loadRRs captures (mtime, size), and immediately before writing back,
|
|
we re-stat; if the file changed, the UPDATE is refused with SERVFAIL
|
|
and the client (Caddy etc.) retries on a fresh load. The window is
|
|
narrow but non-zero.
|
|
|
|
**Recommendation**: don't rsync zone files into a directory the plugin
|
|
is actively writing to. If you must, expect occasional SERVFAILs that
|
|
resolve on retry.
|
|
|
|
### MsgAcceptFunc is process-global
|
|
|
|
CoreDNS 1.14.3 doesn't expose a per-Config `MsgAcceptFunc`, so this
|
|
plugin overrides the miekg/dns package-level default at `init()` time.
|
|
**Every server block** in the process will accept the UPDATE opcode
|
|
at the wire layer — but only blocks with `rfc2136` in their plugin
|
|
chain do anything useful with it (others pass through and return
|
|
FormatError). The actual security boundary is TSIG, enforced both in
|
|
`ServeDNS` and as a defense-in-depth check inside `handleUpdate`.
|
|
|
|
### No-op UPDATEs do not bump the SOA serial
|
|
|
|
If an UPDATE adds an RR that's already present (deduped per RFC 2136
|
|
§3.4.2.2) or deletes one that doesn't exist, the file is not rewritten
|
|
and the SOA serial is not advanced. We return NOERROR. Downstream
|
|
secondaries are not asked to AXFR for a no-change.
|
|
|
|
If you need to force a serial bump (rare), send a touch-UPDATE: add a
|
|
throwaway RR then delete it.
|
|
|
|
### SOA invariants are enforced strictly
|
|
|
|
`loadRRs` refuses zone files with: zero SOAs, multiple SOAs, or an
|
|
SOA whose owner doesn't match the zone apex. Both at startup (via
|
|
`validateZoneFiles`) and on every UPDATE. Zone-file corruption fails
|
|
loud at boot rather than mysteriously on first ACME activity.
|
|
|
|
### Serial counter rolls over at NNNN=9999
|
|
|
|
Format is `YYMMDD*10000 + NNNN`. At NNNN=9999, the next bump rolls to
|
|
the next encoded day with NNNN=0001. On heavy days the encoded date
|
|
drifts ahead of wall time; on quiet days it catches back up. Monotonic
|
|
ordering (the only DNS requirement) holds. uint32 won't wrap for ~117
|
|
years at full 10000/day burn.
|
|
|
|
### TSIG replay window is miekg/dns's default (currently 300s)
|
|
|
|
The fudge window enforced by miekg/dns's TsigVerify is what gates
|
|
replay. If miekg/dns ever changes its default, this plugin's behavior
|
|
changes with it. A future enhancement is `tsig-fudge` as a Corefile
|
|
directive.
|
|
|
|
### Git commit failure is logged at ERROR, not rolled back
|
|
|
|
If `git commit` fails after a successful `writeAtomic`, the zone file
|
|
is correct but the audit trail diverges. We log at ERROR with a
|
|
recovery hint (`git -C <dir> status` + manual commit). We do NOT roll
|
|
back the file write — the auto plugin may have already noticed the
|
|
new mtime, and rolling back creates more races than it solves.
|
|
|
|
### Per-key rate limit
|
|
|
|
UPDATE traffic is token-bucket capped per TSIG key. Default 100
|
|
UPDATEs per 60 seconds. ACME storms are well within this; anything
|
|
beyond is suspicious. Tune via `rate-limit <burst> <period>`.
|
|
|
|
## Building
|
|
|
|
This plugin is consumed by a custom CoreDNS build via `plugin.cfg`:
|
|
|
|
```
|
|
# In CoreDNS source's plugin.cfg, BEFORE the `cache` plugin:
|
|
rfc2136:git.supported.systems/rsp2k/coredns-rfc2136
|
|
```
|
|
|
|
Then `go get git.supported.systems/rsp2k/coredns-rfc2136 && make`.
|
|
|
|
## License
|
|
|
|
MIT (TODO: add LICENSE file).
|