10 Commits

Author SHA1 Message Date
cc33fcbcc8 caddy: add caddy-dns/rfc2136 + test-rfc2136 site -- self-hosted ACME flow
Wires Caddy as the ACME client side of our new self-hosted DNS-01
flow. Proves the design end-to-end: caddy-dns/rfc2136 -> our
CoreDNS rfc2136 plugin -> zone file write -> git auto-commit -> HE
AXFR -> LE validates -> cert issued.

Changes:
- caddy/Dockerfile: --with github.com/caddy-dns/rfc2136 added
  alongside the existing caddy-dns/vultr.
- caddy/Caddyfile: new test-rfc2136.supported.systems site that uses
  the new provider. server coredns:53 (docker internal), key from
  env, propagation_delay 60s + timeout 600s to accommodate HE pull.
- docker-compose.yml: ACME_TSIG_SECRET passed to the caddy container
  (the same secret CoreDNS verifies on the other side of the loop).

First cert issued in production: 2026-05-21 ~13:23 UTC. ~5.5 min
end-to-end from Caddy starting to cert in hand. Documented in
session notes; the cert sits unused in caddy-data/ until/unless
something publishes ports 80/443 for that hostname.
2026-05-21 13:27:05 -06:00
18aa53bdc7 prod-readiness: alpine runtime + uid:gid passthrough + git auto-commit working
The final set of fixes to make the rfc2136 plugin truly operational
in production:

- coredns/Dockerfile: switch runtime stage from gcr.io/distroless to
  alpine:3.20. Distroless has no package manager and no shell, so
  `git commit` (called by the plugin's auto-commit code path) had no
  way to execute. Alpine adds ~10 MB image size but gives us git +
  a usable shell for debugging.
- docker-compose.yml: `user: "${COREDNS_UID:-1003}:${COREDNS_GID:-1004}"`.
  The container runs as the host's rpm user (uid 1003/gid 1004 on
  dell01) so zone files the plugin writes are owned by rpm:rpm on
  the host -- not root. Without this the plugin would write
  root-owned files we couldn't read or git-edit. Defaults match
  dell01; override per-host via env if needed.
- .env.example: documents COREDNS_IMAGE_TAG (CalVer; bump per build).
  Add COREDNS_UID/GID if you need to override on a host where rpm
  has different numeric ids.

Combined with the bumped image tag (2026.05.21.2), the full
end-to-end flow works: caddy/nsupdate -> TSIG verify -> plugin
handler -> atomic file write -> git auto-commit -> auto plugin
reload -> query returns new record.
2026-05-21 13:01:36 -06:00
162abedfdd .env now gitignored; .env.example is the committed template
Per standard Docker convention. The active `.env` is per-host
(contains the actual TSIG secret + any host-specific port/hostname
overrides). The `.env.example` template documents the expected
variables with stub values so a fresh checkout knows what to copy.

Also: docker-compose.yml now passes ACME_TSIG_SECRET to the coredns
container via plain `environment:` directive -- compose auto-reads
`.env` for substitution. No --env-file gymnastics needed at the
invocation level.
2026-05-21 12:37:23 -06:00
3720cd2885 deploy: enable rfc2136 plugin for all 84 production zones
Wires the custom CoreDNS image (built via coredns/Dockerfile, source
includes git.supported.systems/rsp2k/coredns-rfc2136) into production:

- docker-compose.yml: switch coredns service from upstream image pin
  to a build target. New `image: coredns-rfc2136:${COREDNS_IMAGE_TAG}`
  is locally-built; `up -d coredns` triggers the build.
- .env: COREDNS_IMAGE_TAG=2026.05.21 (CalVer). Old COREDNS_IMAGE kept
  as a comment for emergency rollback to upstream 1.11.3.
- Corefile: new rfc2136 directive inside (common) snippet enumerating
  all 84 zones currently in zones/. Plugin is now in the chain for
  every server block (plain DNS, DoT, DoH). UPDATE opcode lands in
  the plugin handler; auto-commit on, CalVer SOA serial bumping on,
  zones-dir /zones matches the existing bind-mount.

TSIG key is read from ${ACME_TSIG_SECRET} which lives in .env.local
(gitignored). Production deployment needs that file synced to dell01
separately.

This commit DOESN'T trigger the deployment by itself -- the image
must be built on dell01 and the container recreated to apply.
2026-05-21 12:17:20 -06:00
083e29bd3e docker-compose: make VULTR_API_KEY optional
Caddy needs this only for DNS-01 cert renewal via Vultr's API, which
happens within the final 30 days of the cert's 90-day lifetime --
roughly once a quarter. Requiring it to be exported on every `docker
compose up` was friction for routine ops (CoreDNS recreations during
unrelated config changes).

Empty default keeps the stack startable without the key in scope. When
renewal is imminent, set the var properly OR (preferred long-term)
migrate Caddy to caddy-dns/rfc2136 pointing at our own plugin and
retire the Vultr dependency entirely.
2026-05-21 11:17:56 -06:00
6d72d65642 Retire prepare-zones.sh pipeline; zones/ is now the served form
Big migration: the source/prepared split is gone. Each zones/*.zone is
now an RFC-compliant zone file that CoreDNS reads directly. Editing a
record is just edit + bump SOA + commit. CoreDNS auto-reloads within
30s; HE pulls on its own 300s SOA-refresh cycle.

Why: groundwork for the coredns-rfc2136 plugin to edit zones in place
without juggling a source/prepared transformation step. Also reduces
the mental model from "edit source, run prep, push" to just "edit".

Changes:
- zones/*.zone: 84 files migrated from Vultr-export form to RFC-compliant
  form (SOA injected, Vultr NS replaced with HE NS, CNAME/MX/NS rdata
  dot-terminated, apex lines get explicit @ prefix). Diff is mechanical
  and byte-count is unchanged (~340K) -- pure formatting promotion.
- docker-compose.yml: bind ./zones:/zones:ro (was ./zones-prepared)
- Makefile: dropped 'prep' target. 'reload' is now a no-op explainer.
  'tls-up' no longer depends on prep. 'clean' no longer wipes prepared.
- scripts/prepare-zones.sh moved to scripts/archive/ (kept for reference).
- .gitignore: updated comment for zones-prepared/ (now legacy).

NOT in this commit (follow-ups):
- CLAUDE.md updates documenting the new workflow.
- scripts/bump-serials.sh helper for manual-edit SOA bumping.
- coredns-rfc2136 plugin refactor (Phase 2b in the plan).
2026-05-21 11:14:42 -06:00
b78cfb0b45 coredns: fix silently-broken healthcheck (distroless image has no wget)
The original healthcheck `wget -qO- http://127.0.0.1:8080/health` has
been failing since day one because the CoreDNS image is distroless —
no shell, no HTTP client. The container has been running in
"(unhealthy)" status the whole time without anyone noticing because
nothing depends_on it.

Replace with `/coredns -version`, which is the thinnest honest check
the image can support. For deeper liveness/readiness, scrape
:8081/health from outside the container.
2026-05-16 14:01:22 -06:00
c1afe77b27 coredns: production Let's Encrypt cert via Caddy sidecar (DNS-01 + Vultr)
Replaces the self-signed dev cert flow with a real LE prod cert for
dns.l.supported.systems, issued and auto-renewed by a Caddy sidecar
using DNS-01 challenge against the Vultr API.

Components:
- caddy/Dockerfile builds Caddy 2.10.0 with caddy-dns/vultr plugin
  via xcaddy. GOTOOLCHAIN=auto so xcaddy can fetch newer Go on demand
  when plugin versions advance their minimum Go.
- caddy/Caddyfile uses DNS-01 with explicit public resolvers (1.1.1.1,
  9.9.9.9) for the propagation check. Without that, Docker's embedded
  DNS leaks the container into the host's split-horizon LAN DNS, which
  returns LAN IPs for ns1.vultr.com and the propagation check fails.
- docker-compose: caddy service shares ./caddy-data with coredns via a
  read-only subpath mount that excludes /acme (account private key).
- Healthcheck doubles as a symlinker: maintains stable cert.pem /
  key.pem names at /data/caddy/ and chmods cert files + their dirs to
  be readable by CoreDNS's nonroot user. Flips to "healthy" only once
  the symlinks dereference (i.e. cert exists), gating CoreDNS start
  via depends_on: service_healthy.
- Corefile unchanged — same /etc/coredns/certs/cert.pem path; only the
  bind-mount source switches from ./certs to ./caddy-data/caddy.
- New Makefile target: tls-up orchestrates the bring-up sequence.

Cert is valid until Aug 12 2026. Verified end-to-end:
  dig @127.0.0.1 -p 8853 +tls +tls-hostname=dns.l.supported.systems ...
  dig @127.0.0.1 -p 8443 +https +tls-hostname=dns.l.supported.systems ...
2026-05-14 01:34:57 -06:00
066ba1892a coredns: DoT (:853) + DoH (:443) listeners with self-signed cert
- New Corefile snippet (common) shared across plain DNS / DoT / DoH so
  zone-loading + forward + cache stay DRY across all three transports
- scripts/generate-certs.sh: openssl-only self-signed RSA cert with SANs
  for localhost / 127.0.0.1 / ::1 / coredns / dns.local. Idempotent —
  skips regeneration if cert is valid >24h ahead; FORCE=1 to rotate.
- Key chmod is 0644 so the CoreDNS container's nonroot user can read it
  via the bind mount. Acceptable for local dev; production should mount
  real certs with proper UID/GID.
- DOT_PORT=8853, DOH_PORT=8443 (avoids Caddy already-on-443 collision)
- Makefile: `make certs`, `make test-tls`
- All three transports verified end-to-end (dig +tls, dig +https,
  curl with raw RFC 8484 wire format)
2026-05-14 01:12:25 -06:00
10867ee319 coredns: docker compose stack with Vultr zone import
- Auto plugin loads zones-prepared/*.zone (regex zone-name extraction)
- scripts/prepare-zones.sh transforms raw Vultr exports:
  * synthesizes SOA (omitted by Vultr; CoreDNS requires it)
  * prepends @ to leading-TAB apex lines to disambiguate owner inheritance
  * dot-terminates NS/MX/CNAME rdata so $ORIGIN doesn't double-suffix
- DNS_PORT defaults to 1053 (5353=avahi, 53=libvirt dnsmasq on this host)
- Forwards non-authoritative queries to 1.1.1.1/1.0.0.1/9.9.9.9
- Makefile targets: prep, up, down, reload, test, logs
- 91 zones loaded
2026-05-12 01:51:09 -06:00