32 Commits

Author SHA1 Message Date
8379e59f55 zones: repoint 24 records 108.61.229.209 → 108.61.23.129 (docker-1 migration)
Bulk swap of the old docker host IP to the new one across 13 zones.
docker-1.supported.systems intentionally preserved at the old IP — the
hostname stays tied to the old box until decommissioned.
2026-05-20 11:43:44 -06:00
8dacdc5d3b scripts: chmod +x notify-he.py 2026-05-20 11:39:25 -06:00
07e4813ad3 supportedsystems.net: add docker-1 A 108.61.23.129
New host replacing 108.61.229.209. Wildcard CNAME on the zone is
suppressed by RFC 4592 in favor of this explicit owner record.
2026-05-20 11:38:52 -06:00
1039838ff5 zones: retire 7 obsolete domains during docker-1 migration
cubeseptic.com, flonhoney.com, hydrushydroponics.com,
idahogreendreams.com, qube-construction.com, qube-septic.com,
qubeseptic.com — all were hosted on 108.61.229.209 (docker-1, old)
and are being decommissioned, not migrated to the replacement host.
2026-05-20 11:38:52 -06:00
890a4214d6 CLAUDE.md: project knowledge — architecture, NOTIFY, SSH deploy, HE quirks 2026-05-20 11:32:25 -06:00
fc2ea0f2fc homestar.ink: add photos.mock-reso.demo A 144.202.24.151 2026-05-20 00:25:08 -06:00
e46c05e3c8 scripts: add check-he.sh — parallel query across HE anycast NS for divergence detection 2026-05-20 00:14:20 -06:00
4dad8f899a homestar.ink: demo records (demo, app.demo, help.demo, mock-api.demo, *.demo) -> 144.202.24.151 2026-05-20 00:10:22 -06:00
48aa6184b6 homestar.ink: move all 108.61.229.209 records to 104.238.162.49 (homestar-1.kpgidaho.com) 2026-05-19 23:42:19 -06:00
66837afd56 supported.systems: route langfuse/grafana/siglip/*.siglip/staging.siglip to dell01 too 2026-05-18 21:42:34 -06:00
c597a21aad supported.systems: gpu/*.gpu -> 154.27.180.210, remove all AAAA 2026-05-18 21:40:56 -06:00
f8363e5ea7 zones: add explicit CNAME-to-apex for RFC 4592 empty-non-terminals
Wildcards in DNS only synthesize for names that don't already exist
in the zone tree. A `_acme-challenge.<sub>` TXT record makes <sub>
an "empty non-terminal" — exists in the tree (as a parent node) but
has no records of its own. Per RFC 4592 §2.2.3, wildcards skip these,
so RFC-compliant resolvers (HE, BIND) return NODATA for <sub> even
when the zone has `* CNAME @`.

Fix: for each <sub> that's an empty non-terminal in a zone with a
wildcard, add an explicit `<sub> CNAME @` so the resolution outcome
matches what the wildcard would have produced. Zero-knowledge — no
need to identify the specific service IP per name.

30 records added across 14 zones:
  acrazy.org (langfuse.dootie)
  context.bet (studio)
  copper-springs.online (docs.butler.dev)
  demostar.io (cw.cw, doom, meet)
  home-inspector.store (api, dashboard, mailpit)
  inspect.pics (admin)
  log.doctor (app, docs)
  malloys.us (cp, cp-sandbox, mary)
  nielsen-inspections.com (calendar, cw, files, v2-calendar)
  qubeseptic.com (api.dispatch, dispatch, leads, mail.dispatch,
                  rentcache.dispatch)
  ryanmalloy.com (c4ai)
  sidejob.pro (api)
  upc.llc (catalog, minio.or, or, s3)

CoreDNS (lenient) was returning the wildcard CNAME for these names
anyway; HE (strict RFC-compliant) was returning empty. After this
change, both behave identically.
2026-05-18 18:34:51 -06:00
c19df5d0a5 homestar.ink: add auth, mcp, rentcache A 108.61.229.209 (fix empty-non-terminals) 2026-05-18 18:31:12 -06:00
fb3f4c5b31 coredns: tighten SOA timers to nudge HE's internal sync
Previously: refresh=3600 retry=1800 minimum=300 (RFC-conformant but
slow). With HE's free secondary service exhibiting puller→anycast
replication lag of up to ~1 hour, we want to give them every signal
to refresh faster.

New: refresh=300 retry=120 minimum=60.

  - refresh 300s: slaves poll our SOA every 5 minutes. ~91 zones polled
    by HE = ~1 query/sec to dell01:53, trivial load. If HE honors the
    master's refresh internally (some secondary providers do, some
    don't), this also nudges their puller→anycast sync.
  - retry 120s: kept < refresh per RFC 1912 §2.2.
  - minimum 60s: tightens NXDOMAIN negative-cache TTL on public
    resolvers from 5 min to 1 min. The dominant window when a newly-
    added name is briefly NX-cached on Cloudflare/Google/Quad9 before
    they re-ask HE.

expire stays at 604800 (1 week) — that's "how long HE keeps serving
stale data if we vanish," unrelated to fresh-data propagation.
2026-05-18 18:25:16 -06:00
f6111c2cbd homestar.ink: explicit A for mock-api.demo (workaround for RFC 4592 empty-non-terminal) 2026-05-18 17:29:05 -06:00
d4a5ce9f82 coredns: script-based NOTIFY to ns1.he.net on every prep
Hurricane Electric requires asymmetric transfer config:
  - AXFR pull from 216.218.133.2 (slave.dns.he.net / ns4.he.net)
  - NOTIFY destination 216.218.130.2 (ns1.he.net)

CoreDNS's transfer plugin uses a single bidirectional `to` list for
both, which is fine in principle but breaks in a confirmed bug: any
`to` with more than one specific IPv4 silently kills server-block
listener startup (no error, zones load, but :53 never binds).
Reproduced on 1.11.3 + 1.12.2 even with a minimal fresh `docker run`.

Workaround:
  - Corefile keeps `transfer { to * }` (open AXFR; firewall does the
    real source-IP filtering on TCP/53)
  - scripts/notify-he.py crafts and sends NOTIFY messages directly to
    216.218.130.2 (only). Pure-stdlib Python — no dependencies.
  - Makefile `prep` target runs notify-he.py after prepare-zones.sh
    so every zone-bump fires NOTIFY automatically.

Verified end-to-end: HE acks NOTIFY (rcode=0) for the 10 zones it
hosts as secondaries; remaining 81 return REFUSED (rcode=5) because
HE doesn't have them configured yet. Note: HE's free slave service
acks NOTIFY but only actually re-pulls AXFR on its hourly poll cycle
(observed behavior — they're poll-based by design). NOTIFY still
useful long-term in case HE changes that behavior; harmless either way.
2026-05-18 16:57:54 -06:00
e31f83b6ae homestar.ink: add *.demo wildcard A 108.61.229.209 2026-05-18 16:41:22 -06:00
b0dace3933 homestar.ink: add help, demo, app.demo, help.demo A records 2026-05-18 13:56:51 -06:00
6cd3087cd5 homestar.ink: add app A 108.61.229.209 2026-05-18 11:38:04 -06:00
c26ef5a5a0 homestar.ink: add api + mock-api A 108.61.229.209 2026-05-17 04:05:26 -06:00
5afdb05667 zones: replace all A 100.79.95.190 with CNAME rpm-bullet.mer.idahomuellers.net
27 records across 15 zones converted from direct A records pointing at
the Tailscale endpoint (100.79.95.190) to CNAMEs pointing at the
Tailscale-named alias. Now if the underlying Tailscale node's IP
changes, only the rpm-bullet record needs updating instead of
chasing 27 zones.

Affected zones (all *.l labels + a handful of dev / dev.mary names):
  acrazy.org      copper-springs.online   demostar.io      flonhoney.com
  homestar.ink    kg7q.cc                 malloys.us       ourjob.site
  qubeseptic.com  ryanmalloy.com          septic.report    sidejob.pro
  supported.systems  warehack.ing         zmesh.systems

No CNAME collisions: none of the converted names had other records
(MX/TXT/SRV/CAA/AAAA) at the same exact name. _acme-challenge.<sub>.l
records sit at distinct subdomains and continue to resolve independently
(verified: TXT lookups for known _acme-challenge.l.* names still return
the original values).

Also fixed prepare-zones.sh: added `|| true` after the serial-detection
grep so a zero-match (first run of a new day) doesn't trip `set -e`
and abort the whole prep.
2026-05-17 03:29:34 -06:00
ada5c872e3 homestar.ink: add photos.mock-reso A 108.61.229.209 2026-05-16 22:01:45 -06:00
87eaa27c4c coredns: auto-bump SOA serial (NN counter) on every make prep
Previously: `SERIAL=$(date +%Y%m%d)01` — same-day re-runs produced the
same serial. HE polled, saw no change, never pulled the update.

Now: scan zones-prepared/ for the highest `YYYYMMDDNN` matching today's
date and increment the NN counter. First run of the day starts at NN=01.
Caps at NN=99 with a clear error message (set SERIAL manually if you
genuinely need >99 changes per day).

`SERIAL=<value> make prep` still overrides the auto-detection, useful
for forcing a specific serial during recovery or for testing.

Verified end-to-end on dell01: prep bumped 2026051601 → 2026051602,
CoreDNS auto-reload picked it up within 30s, all queried zones serve
the new serial. HE will pull on its next refresh poll (SOA refresh
= 3600s, so worst case 1 hour).
2026-05-16 16:25:53 -06:00
57c8366b7f coredns: document why HE-IP restriction lives at firewall, not CoreDNS
Goal was to restrict AXFR to Hurricane Electric's five secondary
nameserver IPs. Tried several CoreDNS Corefile syntaxes:

  transfer { to 216.218.130.2 ... 216.66.1.2 }       # space-separated
  transfer { to 216.218.130.2 \n to 216.218.131.2 }  # multi-line
  transfer { to 216.218.130.2 }                       # single IP
  transfer { to * 216.218.130.2 ... }                 # mixed

Every form with a specific IPv4 address silently breaks server-block
startup — the auto plugin still loads zones into memory but the
:53/:443/:853 listeners never bind. Reproducible on coredns/coredns
1.11.3 AND 1.12.2 with the (common) snippet + auto + forward shape.
Only `to *` results in healthy listener startup.

Even if we got CoreDNS-side filtering to work, Docker's default
userland-proxy rewrites source IPs to the bridge gateway, which would
break IP-based filtering anyway short of `network_mode: host`.

Decision: keep `to *` in CoreDNS, push HE-only filtering to the
FortiWiFi firewall (source-IP-restricted VIP/DNAT for WAN:53/tcp).
This is correct-layered defense — the perimeter does the IP work
before packets ever reach dell01.
2026-05-16 16:04:44 -06:00
1ab88a25f7 coredns: hidden-primary architecture with AXFR for HE secondaries
Goal: serve the public DNS face via Hurricane Electric's free
secondary-DNS service (dns.he.net), with CoreDNS on dell01 acting as
the hidden primary. We edit zones here; HE pulls them via AXFR.

Changes:
- scripts/prepare-zones.sh:
  * SOA mname: ns1.vultr.com -> ns1.he.net (so the apex SOA reflects
    HE as the primary in published RDATA)
  * Strip ns?.vultr.com NS records from each zone and inject the five
    HE nameservers (ns1..ns5.he.net) as the authoritative NS set
- Corefile (shared `common` snippet):
  * Add `transfer { to * }` to authorize AXFR. Tried specific IPs +
    `*` mixed on the same line but CoreDNS silently fails to bind
    server blocks with that syntax; bare `to *` is the only form that
    actually starts the listeners. Trade-off: NOTIFY targeting is lost
    (HE polls per SOA refresh=3600s instead of being pushed). For DNS
    data this is fine since each record is publicly queryable anyway.

Verified AXFR end-to-end: `dig @dell01 -p 5353 acrazy.org AXFR +tcp`
returns 41 records with the new HE NS set and HE-rooted SOA.

Still needed (operator action):
- Firewall NAT for TCP/53 -> 172.16.1.15:5353 (so HE can connect in)
- Add each of the 91 zones at dns.he.net as Secondary DNS pointing
  at 154.27.180.210
- Update each domain's registrar NS records from Vultr -> HE
2026-05-16 15:49:42 -06:00
daf48b373d coredns: rename endpoint dns.l.supported.systems -> dns.supported.systems 2026-05-16 15:24:27 -06:00
b78cfb0b45 coredns: fix silently-broken healthcheck (distroless image has no wget)
The original healthcheck `wget -qO- http://127.0.0.1:8080/health` has
been failing since day one because the CoreDNS image is distroless —
no shell, no HTTP client. The container has been running in
"(unhealthy)" status the whole time without anyone noticing because
nothing depends_on it.

Replace with `/coredns -version`, which is the thinnest honest check
the image can support. For deeper liveness/readiness, scrape
:8081/health from outside the container.
2026-05-16 14:01:22 -06:00
3d47d67e89 coredns: production port defaults (5353 plain DNS, 8081 health)
Deployed to dell01.mer.idahomuellers.net with firewall NAT'ing
public requests in to host:5353/tcp+udp.

Port changes baked in as new defaults so future hosts inherit them:
- DNS_PORT: 1053 -> 5353 (dev was 1053 because avahi-daemon owns
  5353 on Arch desktops; production hosts typically don't run avahi
  and 5353 is the conventional non-privileged DNS port — mDNS uses
  multicast 224.0.0.251:5353 which never conflicts with a unicast bind)
- HEALTH_PORT: 8080 -> 8081 (8080 collided with a python3 service
  on dell01; 8081 is less commonly contested)
2026-05-16 13:59:33 -06:00
c1afe77b27 coredns: production Let's Encrypt cert via Caddy sidecar (DNS-01 + Vultr)
Replaces the self-signed dev cert flow with a real LE prod cert for
dns.l.supported.systems, issued and auto-renewed by a Caddy sidecar
using DNS-01 challenge against the Vultr API.

Components:
- caddy/Dockerfile builds Caddy 2.10.0 with caddy-dns/vultr plugin
  via xcaddy. GOTOOLCHAIN=auto so xcaddy can fetch newer Go on demand
  when plugin versions advance their minimum Go.
- caddy/Caddyfile uses DNS-01 with explicit public resolvers (1.1.1.1,
  9.9.9.9) for the propagation check. Without that, Docker's embedded
  DNS leaks the container into the host's split-horizon LAN DNS, which
  returns LAN IPs for ns1.vultr.com and the propagation check fails.
- docker-compose: caddy service shares ./caddy-data with coredns via a
  read-only subpath mount that excludes /acme (account private key).
- Healthcheck doubles as a symlinker: maintains stable cert.pem /
  key.pem names at /data/caddy/ and chmods cert files + their dirs to
  be readable by CoreDNS's nonroot user. Flips to "healthy" only once
  the symlinks dereference (i.e. cert exists), gating CoreDNS start
  via depends_on: service_healthy.
- Corefile unchanged — same /etc/coredns/certs/cert.pem path; only the
  bind-mount source switches from ./certs to ./caddy-data/caddy.
- New Makefile target: tls-up orchestrates the bring-up sequence.

Cert is valid until Aug 12 2026. Verified end-to-end:
  dig @127.0.0.1 -p 8853 +tls +tls-hostname=dns.l.supported.systems ...
  dig @127.0.0.1 -p 8443 +https +tls-hostname=dns.l.supported.systems ...
2026-05-14 01:34:57 -06:00
066ba1892a coredns: DoT (:853) + DoH (:443) listeners with self-signed cert
- New Corefile snippet (common) shared across plain DNS / DoT / DoH so
  zone-loading + forward + cache stay DRY across all three transports
- scripts/generate-certs.sh: openssl-only self-signed RSA cert with SANs
  for localhost / 127.0.0.1 / ::1 / coredns / dns.local. Idempotent —
  skips regeneration if cert is valid >24h ahead; FORCE=1 to rotate.
- Key chmod is 0644 so the CoreDNS container's nonroot user can read it
  via the bind mount. Acceptable for local dev; production should mount
  real certs with proper UID/GID.
- DOT_PORT=8853, DOH_PORT=8443 (avoids Caddy already-on-443 collision)
- Makefile: `make certs`, `make test-tls`
- All three transports verified end-to-end (dig +tls, dig +https,
  curl with raw RFC 8484 wire format)
2026-05-14 01:12:25 -06:00
1f11c314b9 track .env (no secrets — port config only) 2026-05-12 01:51:22 -06:00
10867ee319 coredns: docker compose stack with Vultr zone import
- Auto plugin loads zones-prepared/*.zone (regex zone-name extraction)
- scripts/prepare-zones.sh transforms raw Vultr exports:
  * synthesizes SOA (omitted by Vultr; CoreDNS requires it)
  * prepends @ to leading-TAB apex lines to disambiguate owner inheritance
  * dot-terminates NS/MX/CNAME rdata so $ORIGIN doesn't double-suffix
- DNS_PORT defaults to 1053 (5353=avahi, 53=libvirt dnsmasq on this host)
- Forwards non-authoritative queries to 1.1.1.1/1.0.0.1/9.9.9.9
- Makefile targets: prep, up, down, reload, test, logs
- 91 zones loaded
2026-05-12 01:51:09 -06:00