rentcache/docs/ARCHITECTURE.md
Ryan Malloy 345ad00692 Add comprehensive production documentation
Documentation created:
- Updated README.md with Docker deployment and production domain
- docs/DEPLOYMENT.md: Complete Docker/Caddy deployment guide
- docs/ARCHITECTURE.md: System design and caching strategies
- docs/QUICKSTART.md: 5-minute setup with real examples
- Updated USAGE.md with production URLs

Key updates:
- Production domain: https://rentcache.l.supported.systems
- Docker Compose with Caddy reverse proxy
- Make commands for easy management
- Cost savings examples (70-90% reduction)
- Complete architecture documentation
- Production deployment checklist
- Monitoring and maintenance guides

The system is now fully documented for production deployment.
2025-09-10 14:22:36 -06:00

18 KiB

Architecture Overview

This document provides a comprehensive overview of RentCache's system architecture, design decisions, and technical implementation details.

🏗️ System Architecture

High-Level Architecture

graph TB
    Client[Client Applications] --> LB[Load Balancer/Caddy]
    LB --> App[RentCache FastAPI]
    
    App --> Auth[Authentication Layer]
    Auth --> Rate[Rate Limiting]
    Rate --> Cache[Cache Manager]
    
    Cache --> L1[L1 Cache<br/>Redis]
    Cache --> L2[L2 Cache<br/>SQLite/PostgreSQL]
    
    Cache --> API[Rentcast API]
    
    App --> Analytics[Usage Analytics]
    Analytics --> DB[(Database)]
    
    App --> Monitor[Health Monitoring]
    App --> Metrics[Metrics Collection]
    
    subgraph "Data Layer"
        L1
        L2
        DB
    end
    
    subgraph "External Services"
        API
    end

Component Responsibilities

FastAPI Application Server

  • Primary Role: HTTP request handling and API routing
  • Key Features:
    • Async/await architecture for high concurrency
    • OpenAPI documentation generation
    • Request/response validation with Pydantic
    • Middleware stack for cross-cutting concerns

Authentication & Authorization

  • Method: Bearer token authentication using SHA-256 hashed API keys
  • Storage: Secure key storage with expiration and usage limits
  • Features: Per-key rate limiting and usage tracking

Multi-Level Caching System

  • L1 Cache (Redis): In-memory cache for ultra-fast access
  • L2 Cache (Database): Persistent cache with analytics
  • Strategy: Write-through with intelligent TTL management

Rate Limiting Engine

  • Implementation: Token bucket algorithm with sliding windows
  • Granularity: Global and per-endpoint limits
  • Backend: Redis-based distributed rate limiting

Usage Analytics

  • Tracking: Request patterns, costs, and performance metrics
  • Storage: Time-series data in relational database
  • Reporting: Real-time dashboards and historical analysis

🔄 Request Flow Architecture

1. Request Processing Pipeline

sequenceDiagram
    participant C as Client
    participant A as Auth Layer
    participant R as Rate Limiter
    participant CM as Cache Manager
    participant RC as Redis Cache
    participant DB as Database
    participant RA as Rentcast API
    
    C->>A: HTTP Request + API Key
    A->>A: Validate & Hash Key
    
    alt Valid API Key
        A->>R: Check Rate Limits
        alt Within Limits
            R->>CM: Cache Lookup
            CM->>RC: Check L1 Cache
            
            alt Cache Hit (L1)
                RC-->>CM: Return Cached Data
                CM-->>C: Response + Cache Headers
            else Cache Miss (L1)
                CM->>DB: Check L2 Cache
                alt Cache Hit (L2)
                    DB-->>CM: Return Cached Data
                    CM->>RC: Populate L1
                    CM-->>C: Response + Cache Headers
                else Cache Miss (L2)
                    CM->>RA: Upstream API Call
                    RA-->>CM: API Response
                    CM->>DB: Store in L2
                    CM->>RC: Store in L1
                    CM-->>C: Response + Cost Headers
                end
            end
        else Rate Limited
            R-->>C: 429 Rate Limit Exceeded
        end
    else Invalid API Key
        A-->>C: 401 Unauthorized
    end

2. Cache Key Generation

Cache Key Strategy: MD5 hash of request signature

cache_key = md5(json.dumps({
    "endpoint": "properties",
    "method": "GET",
    "path_params": {"property_id": "123"},
    "query_params": {"city": "Austin", "state": "TX"},
    "body": {}
}, sort_keys=True)).hexdigest()

Benefits:

  • Deterministic cache keys
  • Collision resistance
  • Parameter order independence
  • Efficient storage and lookup

💾 Caching Strategy

Multi-Level Cache Architecture

Level 1: Redis Cache (Hot Data)

  • Purpose: Ultra-fast access to frequently requested data
  • TTL: 30 minutes to 2 hours
  • Eviction: LRU (Least Recently Used)
  • Size: Memory-limited, optimized for speed
# L1 Cache Configuration
REDIS_CONFIG = {
    "maxmemory": "512mb",
    "maxmemory_policy": "allkeys-lru",
    "save": ["900 1", "300 10", "60 10000"],  # Persistence snapshots
    "appendonly": True,  # AOF for durability
    "appendfsync": "everysec"
}

Level 2: Database Cache (Persistent)

  • Purpose: Persistent cache with analytics and soft deletion
  • TTL: 1 hour to 48 hours based on endpoint volatility
  • Storage: Full response data + metadata
  • Features: Soft deletion, usage tracking, cost analytics
-- Cache Entry Schema
CREATE TABLE cache_entries (
    id SERIAL PRIMARY KEY,
    cache_key VARCHAR(64) UNIQUE NOT NULL,
    endpoint VARCHAR(50) NOT NULL,
    method VARCHAR(10) NOT NULL,
    params_hash VARCHAR(64) NOT NULL,
    response_data JSONB NOT NULL,
    status_code INTEGER NOT NULL,
    estimated_cost DECIMAL(10,2) DEFAULT 0.0,
    created_at TIMESTAMP WITH TIME ZONE DEFAULT NOW(),
    expires_at TIMESTAMP WITH TIME ZONE NOT NULL,
    is_valid BOOLEAN DEFAULT TRUE,
    hit_count INTEGER DEFAULT 0,
    last_accessed TIMESTAMP WITH TIME ZONE DEFAULT NOW()
);

Cache TTL Strategy

Endpoint Type Data Volatility Default TTL Rationale
Property Records Very Low 24 hours Property characteristics rarely change
Value Estimates Medium 1 hour Market fluctuations affect valuations
Rent Estimates Medium 1 hour Rental markets change regularly
Listings High 30 minutes Active market with frequent updates
Market Statistics Low 2 hours Aggregated data changes slowly
Comparables Medium 1 hour Market-dependent analysis

Stale-While-Revalidate Pattern

async def get_with_stale_while_revalidate(cache_key: str, ttl: int):
    """
    Serve stale data immediately while refreshing in background
    """
    cached_data = await cache.get(cache_key)
    
    if cached_data:
        if not cached_data.is_expired:
            return cached_data  # Fresh data
        else:
            # Serve stale data, trigger background refresh
            asyncio.create_task(refresh_cache_entry(cache_key))
            return cached_data  # Stale but usable
    
    # Cache miss - fetch fresh data
    return await fetch_and_cache(cache_key, ttl)

Benefits:

  • Improved user experience (no waiting for fresh data)
  • Reduced upstream API calls during traffic spikes
  • Graceful handling of upstream service issues

🚦 Rate Limiting Implementation

Token Bucket Algorithm

class TokenBucket:
    def __init__(self, capacity: int, refill_rate: float):
        self.capacity = capacity
        self.tokens = capacity
        self.refill_rate = refill_rate  # tokens per second
        self.last_refill = time.time()
    
    async def consume(self, tokens: int = 1) -> bool:
        await self._refill()
        if self.tokens >= tokens:
            self.tokens -= tokens
            return True
        return False
    
    async def _refill(self):
        now = time.time()
        tokens_to_add = (now - self.last_refill) * self.refill_rate
        self.tokens = min(self.capacity, self.tokens + tokens_to_add)
        self.last_refill = now

Multi-Tier Rate Limiting

Global Limits

  • Purpose: Prevent overall API abuse
  • Scope: Per API key across all endpoints
  • Implementation: Redis-based distributed counters

Per-Endpoint Limits

  • Purpose: Protect expensive operations
  • Scope: Specific endpoints (e.g., value estimates)
  • Implementation: Endpoint-specific token buckets

Dynamic Rate Limiting

RATE_LIMITS = {
    "properties": "60/minute",          # Standard property searches
    "value_estimate": "30/minute",      # Expensive AI/ML operations
    "rent_estimate": "30/minute",       # Expensive AI/ML operations
    "market_stats": "20/minute",        # Computationally intensive
    "listings_sale": "100/minute",      # Less expensive, higher volume
    "listings_rental": "100/minute",    # Less expensive, higher volume
    "comparables": "40/minute"          # Moderate complexity
}

📊 Database Schema Design

Core Tables

API Keys Management

CREATE TABLE api_keys (
    id SERIAL PRIMARY KEY,
    key_name VARCHAR(100) UNIQUE NOT NULL,
    key_hash VARCHAR(64) UNIQUE NOT NULL,  -- SHA-256 hash
    is_active BOOLEAN DEFAULT TRUE,
    daily_limit INTEGER DEFAULT 1000,
    monthly_limit INTEGER DEFAULT 30000,
    daily_usage INTEGER DEFAULT 0,
    monthly_usage INTEGER DEFAULT 0,
    last_daily_reset TIMESTAMP WITH TIME ZONE DEFAULT NOW(),
    last_monthly_reset TIMESTAMP WITH TIME ZONE DEFAULT NOW(),
    created_at TIMESTAMP WITH TIME ZONE DEFAULT NOW(),
    expires_at TIMESTAMP WITH TIME ZONE,
    last_used TIMESTAMP WITH TIME ZONE
);

Usage Analytics

CREATE TABLE usage_stats (
    id SERIAL PRIMARY KEY,
    api_key_id INTEGER REFERENCES api_keys(id),
    endpoint VARCHAR(50) NOT NULL,
    method VARCHAR(10) NOT NULL,
    status_code INTEGER NOT NULL,
    response_time_ms DECIMAL(10,2) NOT NULL,
    cache_hit BOOLEAN NOT NULL,
    estimated_cost DECIMAL(10,2) DEFAULT 0.0,
    user_agent TEXT,
    ip_address INET,
    created_at TIMESTAMP WITH TIME ZONE DEFAULT NOW()
);

Rate Limiting State

CREATE TABLE rate_limits (
    id SERIAL PRIMARY KEY,
    api_key_id INTEGER REFERENCES api_keys(id),
    endpoint VARCHAR(50) NOT NULL,
    current_tokens INTEGER DEFAULT 0,
    last_refill TIMESTAMP WITH TIME ZONE DEFAULT NOW(),
    created_at TIMESTAMP WITH TIME ZONE DEFAULT NOW(),
    UNIQUE(api_key_id, endpoint)
);

Indexing Strategy

-- Performance indexes
CREATE INDEX idx_cache_entries_key ON cache_entries(cache_key);
CREATE INDEX idx_cache_entries_endpoint_expires ON cache_entries(endpoint, expires_at);
CREATE INDEX idx_cache_entries_created_at ON cache_entries(created_at);

CREATE INDEX idx_usage_stats_api_key_created ON usage_stats(api_key_id, created_at);
CREATE INDEX idx_usage_stats_endpoint_created ON usage_stats(endpoint, created_at);
CREATE INDEX idx_usage_stats_cache_hit ON usage_stats(cache_hit, created_at);

CREATE INDEX idx_api_keys_hash ON api_keys(key_hash);
CREATE INDEX idx_api_keys_active ON api_keys(is_active);

🔒 Security Architecture

Authentication Flow

graph LR
    Client --> |Bearer Token| Auth[Auth Middleware]
    Auth --> Hash[SHA-256 Hash]
    Hash --> DB[(Database Lookup)]
    DB --> Validate[Validate Expiry & Status]
    Validate --> |Valid| Allow[Allow Request]
    Validate --> |Invalid| Deny[401 Unauthorized]

Security Measures

API Key Protection

  • Storage: Only SHA-256 hashes stored, never plaintext
  • Transmission: HTTPS only, bearer token format
  • Rotation: Configurable expiration dates
  • Revocation: Instant deactivation capability

Network Security

  • HTTPS Enforcement: Automatic SSL with Caddy
  • CORS Configuration: Configurable origin restrictions
  • Rate Limiting: DDoS and abuse protection
  • Request Validation: Comprehensive input sanitization

Container Security

  • Non-root User: Containers run as unprivileged user
  • Minimal Images: Alpine Linux base images
  • Secret Management: Environment variable injection
  • Network Isolation: Docker network segregation

📈 Performance Optimizations

Application Level

Async Architecture

# Concurrent request handling
async def handle_multiple_requests():
    tasks = [
        process_request(req1),
        process_request(req2),
        process_request(req3)
    ]
    results = await asyncio.gather(*tasks)
    return results

Connection Pooling

# HTTP client configuration
http_client = httpx.AsyncClient(
    timeout=30.0,
    limits=httpx.Limits(
        max_connections=100,
        max_keepalive_connections=20
    )
)

# Database connection pooling
engine = create_async_engine(
    DATABASE_URL,
    pool_size=20,
    max_overflow=30,
    pool_pre_ping=True,
    pool_recycle=3600
)

Response Optimization

  • GZip Compression: Automatic response compression
  • JSON Streaming: Large response streaming
  • Conditional Requests: ETag and If-Modified-Since support

Database Level

Query Optimization

-- Efficient cache lookup
EXPLAIN ANALYZE
SELECT response_data, expires_at, is_valid
FROM cache_entries
WHERE cache_key = $1 
  AND expires_at > NOW() 
  AND is_valid = TRUE;

Connection Management

  • Prepared Statements: Reduced parsing overhead
  • Connection Pooling: Shared connection resources
  • Read Replicas: Separate analytics queries

Caching Level

Cache Warming Strategies

async def warm_cache():
    """Pre-populate cache with common requests"""
    common_requests = [
        {"endpoint": "properties", "city": "Austin", "state": "TX"},
        {"endpoint": "properties", "city": "Dallas", "state": "TX"},
        {"endpoint": "market_stats", "zipCode": "78701"}
    ]
    
    for request in common_requests:
        await fetch_and_cache(request)

Memory Management

  • TTL Optimization: Balanced freshness vs. efficiency
  • Compression: Response data compression
  • Eviction Policies: Smart cache replacement

📊 Monitoring and Observability

Metrics Collection

Business Metrics

  • Cache hit ratios by endpoint
  • API cost savings
  • Request volume trends
  • Error rates and patterns

System Metrics

  • Response time percentiles
  • Database query performance
  • Memory and CPU utilization
  • Connection pool statistics

Custom Metrics

# Prometheus-style metrics
cache_hit_ratio = Gauge('cache_hit_ratio', 'Cache hit ratio by endpoint', ['endpoint'])
api_request_duration = Histogram('api_request_duration_seconds', 'API request duration')
upstream_calls = Counter('upstream_api_calls_total', 'Total upstream API calls')

Health Checks

Application Health

async def health_check():
    checks = {
        "database": await check_database_connection(),
        "cache": await check_cache_availability(),
        "upstream": await check_upstream_api(),
        "disk_space": await check_disk_usage()
    }
    
    overall_status = "healthy" if all(checks.values()) else "unhealthy"
    return {"status": overall_status, "checks": checks}

Dependency Health

  • Database connectivity and performance
  • Redis availability and memory usage
  • Upstream API response times
  • Disk space and system resources

🔧 Configuration Management

Environment-Based Configuration

class Settings(BaseSettings):
    # Server
    host: str = "0.0.0.0"
    port: int = 8000
    debug: bool = False
    
    # Database
    database_url: str
    database_echo: bool = False
    
    # Cache
    redis_url: Optional[str] = None
    redis_enabled: bool = False
    default_cache_ttl: int = 3600
    
    # Rate Limiting
    enable_rate_limiting: bool = True
    global_rate_limit: str = "1000/hour"
    
    class Config:
        env_file = ".env"
        case_sensitive = False

Feature Flags

class FeatureFlags:
    ENABLE_REDIS_CACHE = os.getenv("ENABLE_REDIS_CACHE", "true").lower() == "true"
    ENABLE_ANALYTICS = os.getenv("ENABLE_ANALYTICS", "true").lower() == "true"
    ENABLE_CACHE_WARMING = os.getenv("ENABLE_CACHE_WARMING", "false").lower() == "true"
    STRICT_RATE_LIMITING = os.getenv("STRICT_RATE_LIMITING", "false").lower() == "true"

🚀 Scalability Considerations

Horizontal Scaling

Stateless Design

  • No server-side sessions
  • Shared state in Redis/Database
  • Load balancer friendly

Container Orchestration

# Kubernetes deployment example
apiVersion: apps/v1
kind: Deployment
metadata:
  name: rentcache
spec:
  replicas: 3
  selector:
    matchLabels:
      app: rentcache
  template:
    spec:
      containers:
      - name: rentcache
        image: rentcache:latest
        resources:
          requests:
            memory: "512Mi"
            cpu: "250m"
          limits:
            memory: "1Gi"
            cpu: "500m"

Vertical Scaling

Resource Optimization

  • Memory: Cache size tuning
  • CPU: Async I/O optimization
  • Storage: Database indexing and partitioning
  • Network: Connection pooling and keep-alive

Data Partitioning

Database Sharding

-- Partition by date for analytics
CREATE TABLE usage_stats (
    -- columns
) PARTITION BY RANGE (created_at);

CREATE TABLE usage_stats_2024_01 PARTITION OF usage_stats
FOR VALUES FROM ('2024-01-01') TO ('2024-02-01');

Cache Distribution

  • Redis Cluster for distributed caching
  • Consistent hashing for cache key distribution
  • Regional cache replication

🔄 Disaster Recovery

Backup Strategy

Database Backups

# Automated daily backups
pg_dump rentcache | gzip > backup-$(date +%Y%m%d).sql.gz

# Point-in-time recovery
pg_basebackup -D /backup/base -Ft -z -P

Configuration Backups

  • Environment variables
  • Docker Compose files
  • SSL certificates
  • Application configuration

Recovery Procedures

Database Recovery

# Restore from backup
gunzip -c backup-20240115.sql.gz | psql rentcache

# Point-in-time recovery
pg_ctl stop -D /var/lib/postgresql/data
rm -rf /var/lib/postgresql/data/*
pg_basebackup -D /var/lib/postgresql/data -R

Cache Recovery

  • Redis persistence (RDB + AOF)
  • Cache warming from database
  • Graceful degradation to upstream API

This architecture is designed for high availability, performance, and cost optimization while maintaining security and operational simplicity. For implementation details, see the Deployment Guide and Usage Guide.