Wires Caddy as the ACME client side of our new self-hosted DNS-01
flow. Proves the design end-to-end: caddy-dns/rfc2136 -> our
CoreDNS rfc2136 plugin -> zone file write -> git auto-commit -> HE
AXFR -> LE validates -> cert issued.
Changes:
- caddy/Dockerfile: --with github.com/caddy-dns/rfc2136 added
alongside the existing caddy-dns/vultr.
- caddy/Caddyfile: new test-rfc2136.supported.systems site that uses
the new provider. server coredns:53 (docker internal), key from
env, propagation_delay 60s + timeout 600s to accommodate HE pull.
- docker-compose.yml: ACME_TSIG_SECRET passed to the caddy container
(the same secret CoreDNS verifies on the other side of the loop).
First cert issued in production: 2026-05-21 ~13:23 UTC. ~5.5 min
end-to-end from Caddy starting to cert in hand. Documented in
session notes; the cert sits unused in caddy-data/ until/unless
something publishes ports 80/443 for that hostname.
The final set of fixes to make the rfc2136 plugin truly operational
in production:
- coredns/Dockerfile: switch runtime stage from gcr.io/distroless to
alpine:3.20. Distroless has no package manager and no shell, so
`git commit` (called by the plugin's auto-commit code path) had no
way to execute. Alpine adds ~10 MB image size but gives us git +
a usable shell for debugging.
- docker-compose.yml: `user: "${COREDNS_UID:-1003}:${COREDNS_GID:-1004}"`.
The container runs as the host's rpm user (uid 1003/gid 1004 on
dell01) so zone files the plugin writes are owned by rpm:rpm on
the host -- not root. Without this the plugin would write
root-owned files we couldn't read or git-edit. Defaults match
dell01; override per-host via env if needed.
- .env.example: documents COREDNS_IMAGE_TAG (CalVer; bump per build).
Add COREDNS_UID/GID if you need to override on a host where rpm
has different numeric ids.
Combined with the bumped image tag (2026.05.21.2), the full
end-to-end flow works: caddy/nsupdate -> TSIG verify -> plugin
handler -> atomic file write -> git auto-commit -> auto plugin
reload -> query returns new record.
Per standard Docker convention. The active `.env` is per-host
(contains the actual TSIG secret + any host-specific port/hostname
overrides). The `.env.example` template documents the expected
variables with stub values so a fresh checkout knows what to copy.
Also: docker-compose.yml now passes ACME_TSIG_SECRET to the coredns
container via plain `environment:` directive -- compose auto-reads
`.env` for substitution. No --env-file gymnastics needed at the
invocation level.
Wires the custom CoreDNS image (built via coredns/Dockerfile, source
includes git.supported.systems/rsp2k/coredns-rfc2136) into production:
- docker-compose.yml: switch coredns service from upstream image pin
to a build target. New `image: coredns-rfc2136:${COREDNS_IMAGE_TAG}`
is locally-built; `up -d coredns` triggers the build.
- .env: COREDNS_IMAGE_TAG=2026.05.21 (CalVer). Old COREDNS_IMAGE kept
as a comment for emergency rollback to upstream 1.11.3.
- Corefile: new rfc2136 directive inside (common) snippet enumerating
all 84 zones currently in zones/. Plugin is now in the chain for
every server block (plain DNS, DoT, DoH). UPDATE opcode lands in
the plugin handler; auto-commit on, CalVer SOA serial bumping on,
zones-dir /zones matches the existing bind-mount.
TSIG key is read from ${ACME_TSIG_SECRET} which lives in .env.local
(gitignored). Production deployment needs that file synced to dell01
separately.
This commit DOESN'T trigger the deployment by itself -- the image
must be built on dell01 and the container recreated to apply.
Caddy needs this only for DNS-01 cert renewal via Vultr's API, which
happens within the final 30 days of the cert's 90-day lifetime --
roughly once a quarter. Requiring it to be exported on every `docker
compose up` was friction for routine ops (CoreDNS recreations during
unrelated config changes).
Empty default keeps the stack startable without the key in scope. When
renewal is imminent, set the var properly OR (preferred long-term)
migrate Caddy to caddy-dns/rfc2136 pointing at our own plugin and
retire the Vultr dependency entirely.
Big migration: the source/prepared split is gone. Each zones/*.zone is
now an RFC-compliant zone file that CoreDNS reads directly. Editing a
record is just edit + bump SOA + commit. CoreDNS auto-reloads within
30s; HE pulls on its own 300s SOA-refresh cycle.
Why: groundwork for the coredns-rfc2136 plugin to edit zones in place
without juggling a source/prepared transformation step. Also reduces
the mental model from "edit source, run prep, push" to just "edit".
Changes:
- zones/*.zone: 84 files migrated from Vultr-export form to RFC-compliant
form (SOA injected, Vultr NS replaced with HE NS, CNAME/MX/NS rdata
dot-terminated, apex lines get explicit @ prefix). Diff is mechanical
and byte-count is unchanged (~340K) -- pure formatting promotion.
- docker-compose.yml: bind ./zones:/zones:ro (was ./zones-prepared)
- Makefile: dropped 'prep' target. 'reload' is now a no-op explainer.
'tls-up' no longer depends on prep. 'clean' no longer wipes prepared.
- scripts/prepare-zones.sh moved to scripts/archive/ (kept for reference).
- .gitignore: updated comment for zones-prepared/ (now legacy).
NOT in this commit (follow-ups):
- CLAUDE.md updates documenting the new workflow.
- scripts/bump-serials.sh helper for manual-edit SOA bumping.
- coredns-rfc2136 plugin refactor (Phase 2b in the plan).
The original healthcheck `wget -qO- http://127.0.0.1:8080/health` has
been failing since day one because the CoreDNS image is distroless —
no shell, no HTTP client. The container has been running in
"(unhealthy)" status the whole time without anyone noticing because
nothing depends_on it.
Replace with `/coredns -version`, which is the thinnest honest check
the image can support. For deeper liveness/readiness, scrape
:8081/health from outside the container.
Replaces the self-signed dev cert flow with a real LE prod cert for
dns.l.supported.systems, issued and auto-renewed by a Caddy sidecar
using DNS-01 challenge against the Vultr API.
Components:
- caddy/Dockerfile builds Caddy 2.10.0 with caddy-dns/vultr plugin
via xcaddy. GOTOOLCHAIN=auto so xcaddy can fetch newer Go on demand
when plugin versions advance their minimum Go.
- caddy/Caddyfile uses DNS-01 with explicit public resolvers (1.1.1.1,
9.9.9.9) for the propagation check. Without that, Docker's embedded
DNS leaks the container into the host's split-horizon LAN DNS, which
returns LAN IPs for ns1.vultr.com and the propagation check fails.
- docker-compose: caddy service shares ./caddy-data with coredns via a
read-only subpath mount that excludes /acme (account private key).
- Healthcheck doubles as a symlinker: maintains stable cert.pem /
key.pem names at /data/caddy/ and chmods cert files + their dirs to
be readable by CoreDNS's nonroot user. Flips to "healthy" only once
the symlinks dereference (i.e. cert exists), gating CoreDNS start
via depends_on: service_healthy.
- Corefile unchanged — same /etc/coredns/certs/cert.pem path; only the
bind-mount source switches from ./certs to ./caddy-data/caddy.
- New Makefile target: tls-up orchestrates the bring-up sequence.
Cert is valid until Aug 12 2026. Verified end-to-end:
dig @127.0.0.1 -p 8853 +tls +tls-hostname=dns.l.supported.systems ...
dig @127.0.0.1 -p 8443 +https +tls-hostname=dns.l.supported.systems ...
- New Corefile snippet (common) shared across plain DNS / DoT / DoH so
zone-loading + forward + cache stay DRY across all three transports
- scripts/generate-certs.sh: openssl-only self-signed RSA cert with SANs
for localhost / 127.0.0.1 / ::1 / coredns / dns.local. Idempotent —
skips regeneration if cert is valid >24h ahead; FORCE=1 to rotate.
- Key chmod is 0644 so the CoreDNS container's nonroot user can read it
via the bind mount. Acceptable for local dev; production should mount
real certs with proper UID/GID.
- DOT_PORT=8853, DOH_PORT=8443 (avoids Caddy already-on-443 collision)
- Makefile: `make certs`, `make test-tls`
- All three transports verified end-to-end (dig +tls, dig +https,
curl with raw RFC 8484 wire format)