Performance improvements: - Fix O(n²) bubble sort → O(n log n) sort.Slice() in eviction (1000x faster) - Remove custom min() function, use Go 1.25 builtin - Eliminate string allocations in detectSuspiciousPattern hot path (was creating 80MB/sec garbage at 10k msg/sec) Robustness improvements: - Add IP validation in admin API endpoints (ban/unban) Documentation: - Add comprehensive CODE_REVIEW_MATT_HOLT.md with 19 issues identified - Prioritized: 3 critical, 5 high, 8 medium, 3 low priority issues Remaining work (see CODE_REVIEW_MATT_HOLT.md): - Replace global registry with Caddy app system - Move feature flags to struct fields - Fix Prometheus integration - Implement worker pool for storage writes - Make config immutable after Provision
840 lines
20 KiB
Markdown
840 lines
20 KiB
Markdown
# Caddy SIP Guardian - Matt Holt Code Review
|
|
|
|
**Date:** 2025-12-24
|
|
**Reviewer:** Matt Holt (simulated)
|
|
**Module Version:** caddy-sip-guardian v0.1.0
|
|
|
|
---
|
|
|
|
## Executive Summary
|
|
|
|
The SIP Guardian module demonstrates solid understanding of Caddy's module system and implements valuable SIP protection functionality. However, there are several **critical architectural issues** that violate Caddy best practices and could cause problems in production, especially around:
|
|
|
|
1. **Global mutable state** (registry, metrics, feature flags)
|
|
2. **Resource leaks** during config reloads
|
|
3. **Prometheus integration** that doesn't follow Caddy patterns
|
|
4. **Performance anti-patterns** (O(n²) sorting, inefficient string operations)
|
|
|
|
### Severity Levels
|
|
- 🔴 **CRITICAL** - Will cause panics, leaks, or data corruption
|
|
- 🟠 **HIGH** - Violates Caddy best practices, will cause issues at scale
|
|
- 🟡 **MEDIUM** - Suboptimal but functional, should be improved
|
|
- 🔵 **LOW** - Nice-to-have improvements
|
|
|
|
---
|
|
|
|
## 1. Module Architecture & State Management
|
|
|
|
### 🔴 CRITICAL: Global Registry Without Cleanup Mechanism
|
|
**File:** `registry.go:13-16`
|
|
|
|
```go
|
|
var (
|
|
guardianRegistry = make(map[string]*SIPGuardian)
|
|
registryMu sync.RWMutex
|
|
)
|
|
```
|
|
|
|
**Problem:**
|
|
- Global package-level map that grows unbounded across config reloads
|
|
- When Caddy reloads config, new guardians are created but old ones stay in memory
|
|
- Old guardians have running goroutines (cleanupLoop) that never stop
|
|
- After 10 config reloads, you have 10 guardians with 10 cleanup goroutines
|
|
|
|
**Impact:** Memory leak, goroutine leak, eventual OOM in production
|
|
|
|
**Fix:**
|
|
Caddy modules should be **self-contained** and not rely on global registries. Instead:
|
|
|
|
```go
|
|
// Option 1: Use Caddy's app system
|
|
type SIPGuardianApp struct {
|
|
guardians map[string]*SIPGuardian
|
|
mu sync.RWMutex
|
|
}
|
|
|
|
func (app *SIPGuardianApp) CaddyModule() caddy.ModuleInfo {
|
|
return caddy.ModuleInfo{
|
|
ID: "sip_guardian",
|
|
New: func() caddy.Module { return &SIPGuardianApp{guardians: make(map[string]*SIPGuardian)} },
|
|
}
|
|
}
|
|
|
|
// Option 2: Use ctx.App() to store state in Caddy's lifecycle
|
|
func (h *SIPHandler) Provision(ctx caddy.Context) error {
|
|
// Get guardian from Caddy's app context, not global var
|
|
var app *SIPGuardianApp
|
|
if err := ctx.App(&app); err != nil {
|
|
return err
|
|
}
|
|
h.guardian = app.GetOrCreateGuardian("default", &h.SIPGuardian)
|
|
return nil
|
|
}
|
|
```
|
|
|
|
**Why this matters:** Caddy can reload config every 5 minutes in some setups. This leak accumulates fast.
|
|
|
|
---
|
|
|
|
### 🔴 CRITICAL: Feature Flags as Global Mutable Variables
|
|
**File:** `sipguardian.go:17-21`
|
|
|
|
```go
|
|
var (
|
|
enableMetrics = true
|
|
enableWebhooks = true
|
|
enableStorage = true
|
|
)
|
|
```
|
|
|
|
**Problem:**
|
|
- Mutable package-level globals shared across ALL guardian instances
|
|
- Not thread-safe (no mutex protection)
|
|
- Can't have different settings per guardian
|
|
- Violates Caddy's philosophy of "config is everything"
|
|
|
|
**Fix:**
|
|
Move to struct fields with proper configuration:
|
|
|
|
```go
|
|
type SIPGuardian struct {
|
|
// ... existing fields ...
|
|
|
|
// Feature toggles (configurable per instance)
|
|
EnableMetrics bool `json:"enable_metrics,omitempty"`
|
|
EnableWebhooks bool `json:"enable_webhooks,omitempty"`
|
|
EnableStorage bool `json:"enable_storage,omitempty"`
|
|
}
|
|
```
|
|
|
|
Or use build tags if these are compile-time options:
|
|
```go
|
|
//go:build !nometrics
|
|
|
|
func (g *SIPGuardian) recordMetric() {
|
|
// metrics code here
|
|
}
|
|
```
|
|
|
|
---
|
|
|
|
### 🟠 HIGH: Prometheus Metrics Use MustRegister
|
|
**File:** `metrics.go:158-181`
|
|
|
|
```go
|
|
func RegisterMetrics() {
|
|
if metricsRegistered {
|
|
return
|
|
}
|
|
metricsRegistered = true
|
|
|
|
prometheus.MustRegister(...) // WILL PANIC on second call
|
|
}
|
|
```
|
|
|
|
**Problem:**
|
|
- `MustRegister` panics if metrics already registered (e.g., during test runs, or if multiple modules register)
|
|
- The `metricsRegistered` guard helps but is package-level state that doesn't reset
|
|
- Not idempotent across tests or different Caddy instances in same process
|
|
|
|
**Fix:**
|
|
Use Caddy's metrics integration or custom registry:
|
|
|
|
```go
|
|
// Option 1: Use Caddy's admin metrics (recommended)
|
|
// https://caddyserver.com/docs/json/admin/config/metrics/
|
|
|
|
// Option 2: Custom registry per module instance
|
|
type SIPGuardian struct {
|
|
metricsRegistry *prometheus.Registry
|
|
// ... other fields ...
|
|
}
|
|
|
|
func (g *SIPGuardian) Provision(ctx caddy.Context) error {
|
|
g.metricsRegistry = prometheus.NewRegistry()
|
|
g.metricsRegistry.MustRegister(
|
|
// your metrics
|
|
)
|
|
}
|
|
```
|
|
|
|
**Reference:** See how caddy-prometheus module integrates: https://github.com/caddyserver/caddy/blob/master/modules/metrics/metrics.go
|
|
|
|
---
|
|
|
|
### 🟠 HIGH: mergeGuardianConfig Modifies Running Instance
|
|
**File:** `registry.go:68-202`
|
|
|
|
```go
|
|
func mergeGuardianConfig(ctx caddy.Context, g *SIPGuardian, config *SIPGuardian) {
|
|
g.mu.Lock()
|
|
defer g.mu.Unlock()
|
|
|
|
// Modifies MaxFailures, BanTime, whitelists while guardian is actively processing traffic
|
|
if config.MaxFailures > 0 && config.MaxFailures != g.MaxFailures {
|
|
g.MaxFailures = config.MaxFailures // Race with RecordFailure()
|
|
}
|
|
}
|
|
```
|
|
|
|
**Problem:**
|
|
- Changes runtime behavior of actively-processing guardian
|
|
- `RecordFailure()` reads `g.MaxFailures` without lock: `if tracker.count >= g.MaxFailures` (sipguardian.go:408)
|
|
- Could miss bans or ban too aggressively during config merge
|
|
|
|
**Fix:**
|
|
Caddy modules should be immutable once provisioned. Config changes should create NEW instances:
|
|
|
|
```go
|
|
func GetOrCreateGuardianWithConfig(...) (*SIPGuardian, error) {
|
|
registryMu.Lock()
|
|
defer registryMu.Unlock()
|
|
|
|
if g, exists := guardianRegistry[name]; exists {
|
|
// Don't merge - existing instance is immutable
|
|
// Log warning if configs differ
|
|
if !configsMatch(g, config) {
|
|
log.Warn("Cannot change guardian config after provision - config reload required")
|
|
}
|
|
return g, nil
|
|
}
|
|
|
|
// Create fresh instance
|
|
g := createGuardian(config)
|
|
guardianRegistry[name] = g
|
|
return g, nil
|
|
}
|
|
```
|
|
|
|
---
|
|
|
|
## 2. Thread Safety & Concurrency
|
|
|
|
### 🟡 MEDIUM: RecordFailure Reads MaxFailures Without Lock
|
|
**File:** `sipguardian.go:408`
|
|
|
|
```go
|
|
func (g *SIPGuardian) RecordFailure(ip, reason string) bool {
|
|
// ...
|
|
g.mu.Lock()
|
|
defer g.mu.Unlock()
|
|
// ...
|
|
|
|
// Check if we should ban
|
|
if tracker.count >= g.MaxFailures { // RACE: g.MaxFailures can change during mergeGuardianConfig
|
|
g.banIP(ip, reason)
|
|
return true
|
|
}
|
|
}
|
|
```
|
|
|
|
**Problem:**
|
|
- `g.MaxFailures` read without lock protection
|
|
- `mergeGuardianConfig` can modify it concurrently (registry.go:98-100)
|
|
|
|
**Fix:**
|
|
Either:
|
|
1. Make config immutable (recommended - see above)
|
|
2. Or protect with RWMutex:
|
|
|
|
```go
|
|
type SIPGuardian struct {
|
|
// Separate locks for config vs runtime state
|
|
configMu sync.RWMutex
|
|
stateMu sync.RWMutex
|
|
|
|
// Config fields (protected by configMu)
|
|
maxFailures int
|
|
// Runtime fields (protected by stateMu)
|
|
bannedIPs map[string]*BanEntry
|
|
}
|
|
|
|
func (g *SIPGuardian) RecordFailure(...) {
|
|
g.configMu.RLock()
|
|
maxFails := g.maxFailures
|
|
g.configMu.RUnlock()
|
|
|
|
g.stateMu.Lock()
|
|
// ... use maxFails ...
|
|
g.stateMu.Unlock()
|
|
}
|
|
```
|
|
|
|
---
|
|
|
|
### 🟡 MEDIUM: Storage Writes in Unbounded Goroutines
|
|
**File:** `sipguardian.go:397-399, 464-468, 495-499`
|
|
|
|
```go
|
|
if g.storage != nil {
|
|
go func() {
|
|
g.storage.RecordFailure(ip, reason, nil) // Fire and forget
|
|
}()
|
|
}
|
|
```
|
|
|
|
**Problem:**
|
|
- No limit on concurrent goroutines during attack (could spawn 100k goroutines)
|
|
- No error handling if storage is closed
|
|
- If storage write is slow, goroutines pile up
|
|
|
|
**Fix:**
|
|
Use worker pool pattern:
|
|
|
|
```go
|
|
type SIPGuardian struct {
|
|
storageQueue chan storageTask
|
|
storageWg sync.WaitGroup
|
|
}
|
|
|
|
func (g *SIPGuardian) Provision(ctx caddy.Context) error {
|
|
if g.storage != nil {
|
|
g.storageQueue = make(chan storageTask, 1000)
|
|
|
|
// Fixed pool of storage workers
|
|
for i := 0; i < 4; i++ {
|
|
g.storageWg.Add(1)
|
|
go g.storageWorker()
|
|
}
|
|
}
|
|
}
|
|
|
|
func (g *SIPGuardian) storageWorker() {
|
|
defer g.storageWg.Done()
|
|
for task := range g.storageQueue {
|
|
task.execute(g.storage)
|
|
}
|
|
}
|
|
|
|
func (g *SIPGuardian) Cleanup() error {
|
|
close(g.storageQueue) // Stop accepting tasks
|
|
g.storageWg.Wait() // Wait for workers to finish
|
|
// ... rest of cleanup ...
|
|
}
|
|
```
|
|
|
|
---
|
|
|
|
## 3. Performance Issues
|
|
|
|
### 🟠 HIGH: Bubble Sort for Eviction (O(n²))
|
|
**File:** `sipguardian.go:635-641, 689-695`
|
|
|
|
```go
|
|
// Sort by time (oldest first)
|
|
for i := 0; i < len(entries)-1; i++ {
|
|
for j := i + 1; j < len(entries); j++ {
|
|
if entries[j].time.Before(entries[i].time) {
|
|
entries[i], entries[j] = entries[j], entries[i] // Bubble sort!
|
|
}
|
|
}
|
|
}
|
|
```
|
|
|
|
**Problem:**
|
|
- O(n²) time complexity
|
|
- For 100,000 entries (maxTrackedIPs), this is 10 billion comparisons
|
|
- Will lock the mutex for seconds during high load
|
|
|
|
**Fix:**
|
|
Use stdlib `sort.Slice()`:
|
|
|
|
```go
|
|
import "sort"
|
|
|
|
func (g *SIPGuardian) evictOldestTrackers(count int) {
|
|
type ipTime struct {
|
|
ip string
|
|
time time.Time
|
|
}
|
|
|
|
entries := make([]ipTime, 0, len(g.failureCounts))
|
|
for ip, tracker := range g.failureCounts {
|
|
entries = append(entries, ipTime{ip: ip, time: tracker.firstSeen})
|
|
}
|
|
|
|
// O(n log n) sort
|
|
sort.Slice(entries, func(i, j int) bool {
|
|
return entries[i].time.Before(entries[j].time)
|
|
})
|
|
|
|
// Evict oldest
|
|
for i := 0; i < count && i < len(entries); i++ {
|
|
delete(g.failureCounts, entries[i].ip)
|
|
}
|
|
}
|
|
```
|
|
|
|
**Impact:** 100x-1000x faster for large maps
|
|
|
|
---
|
|
|
|
### 🟡 MEDIUM: String.ToLower() Allocates on Hot Path
|
|
**File:** `l4handler.go:351`
|
|
|
|
```go
|
|
func detectSuspiciousPattern(data []byte) string {
|
|
lower := strings.ToLower(string(data)) // Allocates new string + ToLower allocates again
|
|
|
|
for _, def := range suspiciousPatternDefs {
|
|
if strings.Contains(lower, def.pattern) {
|
|
return def.name
|
|
}
|
|
}
|
|
return ""
|
|
}
|
|
```
|
|
|
|
**Problem:**
|
|
- Called on EVERY SIP message
|
|
- `string(data)` allocates (copies 4KB)
|
|
- `ToLower()` allocates another 4KB
|
|
- Under load (10k msg/sec), this is 80MB/sec of garbage
|
|
|
|
**Fix:**
|
|
Use `bytes` package or case-insensitive search:
|
|
|
|
```go
|
|
func detectSuspiciousPattern(data []byte) string {
|
|
for _, def := range suspiciousPatternDefs {
|
|
// Case-insensitive search without allocation
|
|
if bytes.Contains(bytes.ToLower(data), []byte(def.pattern)) {
|
|
return def.name
|
|
}
|
|
}
|
|
return ""
|
|
}
|
|
|
|
// Or for better performance, pre-compile lowercase patterns:
|
|
var suspiciousPatternDefs = []struct {
|
|
name string
|
|
pattern []byte // lowercase pattern
|
|
}{
|
|
{"sipvicious", []byte("sipvicious")},
|
|
// ...
|
|
}
|
|
|
|
func detectSuspiciousPattern(data []byte) string {
|
|
lower := bytes.ToLower(data) // Single allocation
|
|
for _, def := range suspiciousPatternDefs {
|
|
if bytes.Contains(lower, def.pattern) {
|
|
return def.name
|
|
}
|
|
}
|
|
return ""
|
|
}
|
|
```
|
|
|
|
---
|
|
|
|
### 🔵 LOW: min() Function Defined Locally
|
|
**File:** `l4handler.go:367-372`
|
|
|
|
```go
|
|
func min(a, b int) int {
|
|
if a < b {
|
|
return a
|
|
}
|
|
return b
|
|
}
|
|
```
|
|
|
|
**Problem:**
|
|
Go 1.21+ has builtin `min()` function. Your `go.mod` says `go 1.25`.
|
|
|
|
**Fix:**
|
|
```go
|
|
// Delete the function, use builtin
|
|
sample := buf[:min(64, len(buf))]
|
|
```
|
|
|
|
---
|
|
|
|
## 4. Resource Management & Lifecycle
|
|
|
|
### 🟠 HIGH: Cleanup() Doesn't Respect Context
|
|
**File:** `sipguardian.go:891-932`
|
|
|
|
```go
|
|
func (g *SIPGuardian) Cleanup() error {
|
|
// ...
|
|
select {
|
|
case <-done:
|
|
g.logger.Debug("All goroutines stopped cleanly")
|
|
case <-time.After(5 * time.Second): // Hardcoded timeout
|
|
g.logger.Warn("Timeout waiting for goroutines to stop")
|
|
}
|
|
}
|
|
```
|
|
|
|
**Problem:**
|
|
- Caddy's shutdown might have different timeout
|
|
- Should respect `caddy.Context` shutdown signal
|
|
|
|
**Fix:**
|
|
```go
|
|
func (g *SIPGuardian) Cleanup() error {
|
|
// Use the context from Provision or a configurable timeout
|
|
ctx, cancel := context.WithTimeout(context.Background(), g.shutdownTimeout())
|
|
defer cancel()
|
|
|
|
// Signal stop
|
|
close(g.stopCh)
|
|
|
|
// Wait with context
|
|
done := make(chan struct{})
|
|
go func() {
|
|
g.wg.Wait()
|
|
close(done)
|
|
}()
|
|
|
|
select {
|
|
case <-done:
|
|
g.logger.Debug("Cleanup completed")
|
|
case <-ctx.Done():
|
|
g.logger.Warn("Cleanup timed out", zap.Duration("timeout", g.shutdownTimeout()))
|
|
return ctx.Err()
|
|
}
|
|
|
|
// ... rest of cleanup ...
|
|
return nil
|
|
}
|
|
|
|
func (g *SIPGuardian) shutdownTimeout() time.Duration {
|
|
if g.CleanupTimeout > 0 {
|
|
return time.Duration(g.CleanupTimeout)
|
|
}
|
|
return 10 * time.Second // reasonable default
|
|
}
|
|
```
|
|
|
|
---
|
|
|
|
### 🟡 MEDIUM: DNS Whitelist Doesn't Stop on Cleanup
|
|
**File:** `sipguardian.go:900-902`
|
|
|
|
```go
|
|
func (g *SIPGuardian) Cleanup() error {
|
|
// Stop DNS whitelist background refresh
|
|
if g.dnsWhitelist != nil {
|
|
g.dnsWhitelist.Stop() // Is this method implemented? Does it block?
|
|
}
|
|
}
|
|
```
|
|
|
|
**Question:** Does `dnsWhitelist.Stop()` wait for goroutines to finish? Should add to WaitGroup.
|
|
|
|
**Fix:**
|
|
```go
|
|
// In Provision
|
|
if g.dnsWhitelist != nil {
|
|
g.wg.Add(1)
|
|
go func() {
|
|
defer g.wg.Done()
|
|
g.dnsWhitelist.Run(g.stopCh) // Respect stopCh
|
|
}()
|
|
}
|
|
|
|
// Cleanup will automatically wait via g.wg.Wait()
|
|
```
|
|
|
|
---
|
|
|
|
## 5. Configuration & Validation
|
|
|
|
### 🟡 MEDIUM: Validate() Ranges Too Restrictive
|
|
**File:** `sipguardian.go:936-966`
|
|
|
|
```go
|
|
func (g *SIPGuardian) Validate() error {
|
|
if g.MaxFailures > 1000 {
|
|
return fmt.Errorf("max_failures exceeds reasonable limit (1000), got %d", g.MaxFailures)
|
|
}
|
|
|
|
if time.Duration(g.FindTime) > 24*time.Hour {
|
|
return fmt.Errorf("find_time exceeds reasonable limit (24h), got %v", time.Duration(g.FindTime))
|
|
}
|
|
}
|
|
```
|
|
|
|
**Question:** Why are these "reasonable limits" hardcoded?
|
|
|
|
- Some users might want `max_failures = 2` for high-security environments
|
|
- Some might want `find_time = 7d` for long-term tracking
|
|
|
|
**Fix:**
|
|
Either document WHY these limits exist, or make them configurable:
|
|
|
|
```go
|
|
const (
|
|
maxFailuresLimit = 10000 // Prevent integer overflow in counter logic
|
|
maxFindTimeLimit = 30 * 24 * time.Hour // 30 days max for memory reasons
|
|
)
|
|
|
|
func (g *SIPGuardian) Validate() error {
|
|
if g.MaxFailures < 1 || g.MaxFailures > maxFailuresLimit {
|
|
return fmt.Errorf("max_failures must be 1-%d, got %d", maxFailuresLimit, g.MaxFailures)
|
|
}
|
|
// ... etc
|
|
}
|
|
```
|
|
|
|
---
|
|
|
|
### 🔵 LOW: UnmarshalCaddyfile Doesn't Validate Inline
|
|
**File:** `sipguardian.go:746-889`
|
|
|
|
**Observation:**
|
|
Parsing and validation are separate (Validate called after Provision). This is fine per Caddy conventions, but some basic validation could happen during unmarshaling to give better error messages.
|
|
|
|
Example:
|
|
```go
|
|
case "max_failures":
|
|
// ... parse ...
|
|
if val < 1 {
|
|
return d.Errf("max_failures must be positive, got %d", val)
|
|
}
|
|
g.MaxFailures = val
|
|
```
|
|
|
|
---
|
|
|
|
## 6. Error Handling & Robustness
|
|
|
|
### 🟡 MEDIUM: Inconsistent Error Handling in Provision
|
|
**File:** `sipguardian.go:103-241`
|
|
|
|
```go
|
|
func (g *SIPGuardian) Provision(ctx caddy.Context) error {
|
|
// Parse whitelist CIDRs
|
|
for _, cidr := range g.WhitelistCIDR {
|
|
_, network, err := net.ParseCIDR(cidr)
|
|
if err != nil {
|
|
return fmt.Errorf("invalid whitelist CIDR %s: %v", cidr, err) // FATAL
|
|
}
|
|
g.whitelistNets = append(g.whitelistNets, network)
|
|
}
|
|
|
|
// Initialize storage
|
|
if enableStorage && g.StoragePath != "" {
|
|
storage, err := InitStorage(g.logger, StorageConfig{Path: g.StoragePath})
|
|
if err != nil {
|
|
g.logger.Warn("Failed to initialize storage, continuing without persistence",
|
|
zap.Error(err), // WARNING - continues
|
|
)
|
|
}
|
|
}
|
|
}
|
|
```
|
|
|
|
**Problem:**
|
|
Why is invalid CIDR fatal but storage failure is just a warning?
|
|
|
|
**Fix:**
|
|
Be consistent. Either:
|
|
1. Make optional features non-fatal (recommended for flexibility)
|
|
2. Or make all failures fatal (strict validation)
|
|
|
|
Document in config:
|
|
```caddyfile
|
|
sip_guardian {
|
|
whitelist 10.0.0/8 # Invalid - Provision will FAIL
|
|
storage /bad/path # Invalid - Provision will WARN and continue
|
|
}
|
|
```
|
|
|
|
---
|
|
|
|
### 🟡 MEDIUM: BanIP Creates Fake Failure Tracker
|
|
**File:** `sipguardian.go:968-991`
|
|
|
|
```go
|
|
func (g *SIPGuardian) BanIP(ip, reason string) {
|
|
// ...
|
|
if _, exists := g.failureCounts[ip]; !exists {
|
|
g.failureCounts[ip] = &failureTracker{
|
|
count: g.MaxFailures, // Fake failures
|
|
firstSeen: time.Now(),
|
|
lastSeen: time.Now(),
|
|
}
|
|
}
|
|
g.banIP(ip, reason)
|
|
}
|
|
```
|
|
|
|
**Problem:**
|
|
Why does manually banning an IP create failure tracking data? This pollutes metrics and storage.
|
|
|
|
**Fix:**
|
|
Separate manual bans from automatic bans:
|
|
|
|
```go
|
|
type BanEntry struct {
|
|
IP string
|
|
Reason string
|
|
Source string // "automatic", "manual", "api"
|
|
BannedAt time.Time
|
|
ExpiresAt time.Time
|
|
HitCount int // 0 for manual bans
|
|
}
|
|
|
|
func (g *SIPGuardian) BanIP(ip, reason string) {
|
|
// Don't create fake failure tracker
|
|
entry := &BanEntry{
|
|
IP: ip,
|
|
Source: "manual",
|
|
Reason: reason,
|
|
BannedAt: time.Now(),
|
|
ExpiresAt: time.Now().Add(time.Duration(g.BanTime)),
|
|
HitCount: 0, // No failures recorded
|
|
}
|
|
g.bannedIPs[ip] = entry
|
|
}
|
|
```
|
|
|
|
---
|
|
|
|
## 7. API Design
|
|
|
|
### 🟡 MEDIUM: Admin Handler URL Routing is Fragile
|
|
**File:** `admin.go:56-70`
|
|
|
|
```go
|
|
func (h *AdminHandler) ServeHTTP(w http.ResponseWriter, r *http.Request, next caddyhttp.Handler) error {
|
|
path := r.URL.Path
|
|
|
|
switch {
|
|
case strings.HasSuffix(path, "/bans"): // Matches "/foo/bans" too
|
|
return h.handleBans(w, r)
|
|
case strings.Contains(path, "/unban/"): // Matches "/foo/unban/bar/baz"
|
|
return h.handleUnban(w, r, path)
|
|
}
|
|
}
|
|
```
|
|
|
|
**Problem:**
|
|
- `strings.HasSuffix` matches unintended paths
|
|
- Path extraction is manual and error-prone
|
|
|
|
**Fix:**
|
|
Use proper HTTP router or at least exact matching:
|
|
|
|
```go
|
|
// Option 1: Use http.ServeMux patterns (Go 1.22+)
|
|
mux := http.NewServeMux()
|
|
mux.HandleFunc("GET /api/sip-guardian/bans", h.handleBans)
|
|
mux.HandleFunc("POST /api/sip-guardian/ban/{ip}", h.handleBan)
|
|
mux.HandleFunc("DELETE /api/sip-guardian/unban/{ip}", h.handleUnban)
|
|
|
|
// Option 2: Exact prefix matching
|
|
switch {
|
|
case r.URL.Path == "/api/sip-guardian/bans":
|
|
return h.handleBans(w, r)
|
|
case strings.HasPrefix(r.URL.Path, "/api/sip-guardian/ban/"):
|
|
ip := strings.TrimPrefix(r.URL.Path, "/api/sip-guardian/ban/")
|
|
ip = strings.TrimSuffix(ip, "/")
|
|
return h.handleBan(w, r, ip)
|
|
}
|
|
```
|
|
|
|
---
|
|
|
|
### 🔵 LOW: No IP Validation in Admin Endpoints
|
|
**File:** `admin.go:196, 210`
|
|
|
|
```go
|
|
ip := strings.TrimSuffix(parts[1], "/")
|
|
|
|
// Use public BanIP method for proper encapsulation
|
|
h.guardian.BanIP(ip, body.Reason) // What if ip = "not-an-ip"?
|
|
```
|
|
|
|
**Fix:**
|
|
```go
|
|
ip := strings.TrimSuffix(parts[1], "/")
|
|
if net.ParseIP(ip) == nil {
|
|
http.Error(w, "Invalid IP address", http.StatusBadRequest)
|
|
return nil
|
|
}
|
|
h.guardian.BanIP(ip, body.Reason)
|
|
```
|
|
|
|
---
|
|
|
|
## 8. Testing Observations
|
|
|
|
### 🟡 MEDIUM: Tests Use Global State
|
|
**Observation:** Since metrics and registry are global, tests might interfere with each other.
|
|
|
|
**Recommendation:**
|
|
Add test helpers to reset state:
|
|
|
|
```go
|
|
// In sipguardian_test.go
|
|
func resetGlobalState() {
|
|
registryMu.Lock()
|
|
guardianRegistry = make(map[string]*SIPGuardian)
|
|
registryMu.Unlock()
|
|
|
|
metricsRegistered = false
|
|
// ... reset other globals ...
|
|
}
|
|
|
|
func TestSomething(t *testing.T) {
|
|
t.Cleanup(resetGlobalState)
|
|
// ... test code ...
|
|
}
|
|
```
|
|
|
|
---
|
|
|
|
## Summary of Recommendations
|
|
|
|
### Must Fix (🔴 Critical)
|
|
1. **Eliminate global registry** - use Caddy app system or context storage
|
|
2. **Remove global feature flags** - make them config options
|
|
3. **Fix Prometheus integration** - use custom registry or Caddy's metrics
|
|
|
|
### Should Fix (🟠 High)
|
|
4. **Make configs immutable** after provisioning
|
|
5. **Fix bubble sort** - use sort.Slice()
|
|
6. **Bound storage goroutines** - use worker pool
|
|
|
|
### Nice to Have (🟡 Medium)
|
|
7. Add context to Cleanup timeout
|
|
8. Validate IPs in admin API
|
|
9. Fix string allocation in hot path
|
|
10. Separate manual vs automatic bans
|
|
|
|
### Polish (🔵 Low)
|
|
11. Remove custom min() function
|
|
12. Improve URL routing in admin handler
|
|
13. Add inline validation in UnmarshalCaddyfile
|
|
|
|
---
|
|
|
|
## Positive Observations 👍
|
|
|
|
1. **Excellent Cleanup() implementation** - proper WaitGroup usage, timeout handling
|
|
2. **Good Validate() implementation** - validates config before use
|
|
3. **Interface guards** - ensures compile-time type checking
|
|
4. **Comprehensive feature set** - GeoIP, DNS whitelist, enumeration detection, etc.
|
|
5. **Good logging** - structured logging with zap
|
|
6. **Test coverage** - multiple test files covering features
|
|
|
|
---
|
|
|
|
## Final Verdict
|
|
|
|
This is **solid work** with good understanding of Caddy modules. The core functionality is sound, but the **global state issues are critical** and must be addressed before production use. Once those are fixed, this would be a great module for the Caddy community.
|
|
|
|
The main pattern to change is: **Think per-instance, not global**. Every piece of state should live on the struct, not in package-level vars.
|
|
|
|
---
|
|
|
|
**Estimated effort to fix critical issues:** 4-6 hours
|
|
**Estimated effort for all recommendations:** 8-12 hours
|
|
|
|
Would you like me to prioritize any specific issues or help implement fixes?
|
|
|