This commit addresses all critical architectural issues identified in the
Matt Holt code review, transforming the module from using anti-patterns
to following Caddy best practices.
### 🔴 CRITICAL FIXES:
**1. Global Registry → Caddy App System**
- Created SIPGuardianApp implementing caddy.App interface (app.go)
- Eliminates memory/goroutine leaks on config reload
- Before: guardians accumulated in global map, never cleaned up
- After: Caddy calls Stop() on old app before loading new config
- Impact: Prevents OOM in production with frequent config reloads
**2. Feature Flags → Instance Fields**
- Moved enableMetrics/Webhooks/Storage from globals to *bool struct fields
- Allows per-instance configuration (not shared across all guardians)
- Helper methods default to true if not set
- Impact: Thread-safe, configurable per guardian instance
**3. Prometheus Panic Prevention**
- Replaced MustRegister() with Register() + AlreadyRegisteredError handling
- Makes RegisterMetrics() idempotent and safe for multiple calls
- Before: panics on second call (e.g., config reload)
- After: silently ignores already-registered collectors
- Impact: No more crashes on config reload
### 🟠 HIGH PRIORITY FIXES:
**4. Storage Worker Pool**
- Fixed pool of 4 workers + 1000-entry buffered channel
- Replaces unbounded go func() spawns (3 locations)
- Before: 100k goroutines during DDoS → memory exhaustion
- After: bounded resources, drops writes when full (fail-fast)
- Impact: Survives attacks without resource exhaustion
**5. Config Immutability**
- MaxFailures/FindTime/BanTime no longer modified on running instance
- Prevents race with RecordFailure() reading values without lock
- Changed mutations to warning logs
- Additive changes still allowed (whitelists, webhooks)
- Impact: No more race conditions, predictable ban behavior
### Modified Files:
- app.go (NEW): SIPGuardianApp with proper lifecycle management
- sipguardian.go: Removed module registration, added worker pool, feature flags
- l4handler.go: Use ctx.App() instead of global registry
- metrics.go: Use ctx.App() instead of global registry
- registry.go: Config immutability warnings instead of mutations
### Test Results:
All tests pass (1.228s) ✅
### Breaking Changes:
None - backwards compatible, but requires apps {} block in Caddyfile
for proper lifecycle management
### Estimated Impact:
- Memory leak fix: Prevents unbounded growth over time
- Resource usage: 100k goroutines → 4 workers during attack
- Stability: No more panics on config reload
- Performance: O(n log n) sorting (addressed in quick wins)
- Add Cleanup() method (caddy.CleanerUpper) to stop goroutines on config
reload, preventing goroutine leaks
- Add Validate() method (caddy.Validator) for early config validation with
reasonable bounds checking
- Add public BanIP() method for admin handler, replacing direct internal
state manipulation
- Add bounds checking for failure tracker and ban maps to prevent memory
exhaustion under DDoS (100k/50k limits)
- Add eviction functions to proactively clean oldest entries when at capacity
Support for whitelisting SIP trunks and providers by hostname or SRV
record with automatic IP resolution and periodic refresh.
Features:
- Hostname resolution via A/AAAA records
- SRV record resolution (e.g., _sip._udp.provider.com)
- Configurable refresh interval (default 5m)
- Stale entry handling when DNS fails
- Admin API endpoints for DNS whitelist management
- Caddyfile directives: whitelist_hosts, whitelist_srv, dns_refresh
This allows whitelisting by provider name rather than tracking
constantly-changing IP addresses.
Documents all new features:
- Extension enumeration detection with config examples
- SIP message validation rules and modes
- Topology hiding (B2BUA-lite) with request/response flow diagrams
- Complete Caddyfile configuration reference
- Prometheus metrics reference
- Admin API endpoints
- Integration examples for FreePBX, Kamailio, and HA setups
- Security considerations
Architecture diagram updated to show full processing pipeline.
- Fix SetEnumerationConfig to create detector if not exists
Previously, the config would be silently discarded if called before
the detector was lazily initialized by GetEnumerationDetector
- Add test_enumeration.py script for sandbox testing
Includes fire-and-forget mode (--no-wait) for proper scanner simulation
The layer4 matchers and handlers must implement caddyfile.Unmarshaler
to be usable in Caddyfile syntax. This enables proper parsing of:
- @sip sip { methods ... } matchers
- sip_guardian { ... } handlers