Ryan Malloy 345ad00692 Add comprehensive production documentation

Documentation created:
- Updated README.md with Docker deployment and production domain
- docs/DEPLOYMENT.md: Complete Docker/Caddy deployment guide
- docs/ARCHITECTURE.md: System design and caching strategies
- docs/QUICKSTART.md: 5-minute setup with real examples
- Updated USAGE.md with production URLs

Key updates:
- Production domain: https://rentcache.l.supported.systems
- Docker Compose with Caddy reverse proxy
- Make commands for easy management
- Cost savings examples (70-90% reduction)
- Complete architecture documentation
- Production deployment checklist
- Monitoring and maintenance guides

The system is now fully documented for production deployment.

2025-09-10 14:22:36 -06:00

18 KiB

Raw Blame History

Architecture Overview

This document provides a comprehensive overview of RentCache's system architecture, design decisions, and technical implementation details.

🏗️ System Architecture

High-Level Architecture

graph TB
    Client[Client Applications] --> LB[Load Balancer/Caddy]
    LB --> App[RentCache FastAPI]
    
    App --> Auth[Authentication Layer]
    Auth --> Rate[Rate Limiting]
    Rate --> Cache[Cache Manager]
    
    Cache --> L1[L1 Cache<br/>Redis]
    Cache --> L2[L2 Cache<br/>SQLite/PostgreSQL]
    
    Cache --> API[Rentcast API]
    
    App --> Analytics[Usage Analytics]
    Analytics --> DB[(Database)]
    
    App --> Monitor[Health Monitoring]
    App --> Metrics[Metrics Collection]
    
    subgraph "Data Layer"
        L1
        L2
        DB
    end
    
    subgraph "External Services"
        API
    end

Component Responsibilities

FastAPI Application Server

Primary Role: HTTP request handling and API routing
Key Features:
- Async/await architecture for high concurrency
- OpenAPI documentation generation
- Request/response validation with Pydantic
- Middleware stack for cross-cutting concerns

Authentication & Authorization

Method: Bearer token authentication using SHA-256 hashed API keys
Storage: Secure key storage with expiration and usage limits
Features: Per-key rate limiting and usage tracking

Multi-Level Caching System

L1 Cache (Redis): In-memory cache for ultra-fast access
L2 Cache (Database): Persistent cache with analytics
Strategy: Write-through with intelligent TTL management

Rate Limiting Engine

Implementation: Token bucket algorithm with sliding windows
Granularity: Global and per-endpoint limits
Backend: Redis-based distributed rate limiting

Usage Analytics

Tracking: Request patterns, costs, and performance metrics
Storage: Time-series data in relational database
Reporting: Real-time dashboards and historical analysis

🔄 Request Flow Architecture

1. Request Processing Pipeline

sequenceDiagram
    participant C as Client
    participant A as Auth Layer
    participant R as Rate Limiter
    participant CM as Cache Manager
    participant RC as Redis Cache
    participant DB as Database
    participant RA as Rentcast API
    
    C->>A: HTTP Request + API Key
    A->>A: Validate & Hash Key
    
    alt Valid API Key
        A->>R: Check Rate Limits
        alt Within Limits
            R->>CM: Cache Lookup
            CM->>RC: Check L1 Cache
            
            alt Cache Hit (L1)
                RC-->>CM: Return Cached Data
                CM-->>C: Response + Cache Headers
            else Cache Miss (L1)
                CM->>DB: Check L2 Cache
                alt Cache Hit (L2)
                    DB-->>CM: Return Cached Data
                    CM->>RC: Populate L1
                    CM-->>C: Response + Cache Headers
                else Cache Miss (L2)
                    CM->>RA: Upstream API Call
                    RA-->>CM: API Response
                    CM->>DB: Store in L2
                    CM->>RC: Store in L1
                    CM-->>C: Response + Cost Headers
                end
            end
        else Rate Limited
            R-->>C: 429 Rate Limit Exceeded
        end
    else Invalid API Key
        A-->>C: 401 Unauthorized
    end

2. Cache Key Generation

Cache Key Strategy: MD5 hash of request signature

cache_key = md5(json.dumps({
    "endpoint": "properties",
    "method": "GET",
    "path_params": {"property_id": "123"},
    "query_params": {"city": "Austin", "state": "TX"},
    "body": {}
}, sort_keys=True)).hexdigest()

Benefits:

Deterministic cache keys
Collision resistance
Parameter order independence
Efficient storage and lookup

💾 Caching Strategy

Multi-Level Cache Architecture

Level 1: Redis Cache (Hot Data)

Purpose: Ultra-fast access to frequently requested data
TTL: 30 minutes to 2 hours
Eviction: LRU (Least Recently Used)
Size: Memory-limited, optimized for speed

# L1 Cache Configuration
REDIS_CONFIG = {
    "maxmemory": "512mb",
    "maxmemory_policy": "allkeys-lru",
    "save": ["900 1", "300 10", "60 10000"],  # Persistence snapshots
    "appendonly": True,  # AOF for durability
    "appendfsync": "everysec"
}

Level 2: Database Cache (Persistent)

Purpose: Persistent cache with analytics and soft deletion
TTL: 1 hour to 48 hours based on endpoint volatility
Storage: Full response data + metadata
Features: Soft deletion, usage tracking, cost analytics

-- Cache Entry Schema
CREATE TABLE cache_entries (
    id SERIAL PRIMARY KEY,
    cache_key VARCHAR(64) UNIQUE NOT NULL,
    endpoint VARCHAR(50) NOT NULL,
    method VARCHAR(10) NOT NULL,
    params_hash VARCHAR(64) NOT NULL,
    response_data JSONB NOT NULL,
    status_code INTEGER NOT NULL,
    estimated_cost DECIMAL(10,2) DEFAULT 0.0,
    created_at TIMESTAMP WITH TIME ZONE DEFAULT NOW(),
    expires_at TIMESTAMP WITH TIME ZONE NOT NULL,
    is_valid BOOLEAN DEFAULT TRUE,
    hit_count INTEGER DEFAULT 0,
    last_accessed TIMESTAMP WITH TIME ZONE DEFAULT NOW()
);

Cache TTL Strategy

Endpoint Type	Data Volatility	Default TTL	Rationale
Property Records	Very Low	24 hours	Property characteristics rarely change
Value Estimates	Medium	1 hour	Market fluctuations affect valuations
Rent Estimates	Medium	1 hour	Rental markets change regularly
Listings	High	30 minutes	Active market with frequent updates
Market Statistics	Low	2 hours	Aggregated data changes slowly
Comparables	Medium	1 hour	Market-dependent analysis

Stale-While-Revalidate Pattern

async def get_with_stale_while_revalidate(cache_key: str, ttl: int):
    """
    Serve stale data immediately while refreshing in background
    """
    cached_data = await cache.get(cache_key)
    
    if cached_data:
        if not cached_data.is_expired:
            return cached_data  # Fresh data
        else:
            # Serve stale data, trigger background refresh
            asyncio.create_task(refresh_cache_entry(cache_key))
            return cached_data  # Stale but usable
    
    # Cache miss - fetch fresh data
    return await fetch_and_cache(cache_key, ttl)

Benefits:

Improved user experience (no waiting for fresh data)
Reduced upstream API calls during traffic spikes
Graceful handling of upstream service issues

🚦 Rate Limiting Implementation

Token Bucket Algorithm

class TokenBucket:
    def __init__(self, capacity: int, refill_rate: float):
        self.capacity = capacity
        self.tokens = capacity
        self.refill_rate = refill_rate  # tokens per second
        self.last_refill = time.time()
    
    async def consume(self, tokens: int = 1) -> bool:
        await self._refill()
        if self.tokens >= tokens:
            self.tokens -= tokens
            return True
        return False
    
    async def _refill(self):
        now = time.time()
        tokens_to_add = (now - self.last_refill) * self.refill_rate
        self.tokens = min(self.capacity, self.tokens + tokens_to_add)
        self.last_refill = now

Multi-Tier Rate Limiting

Global Limits

Purpose: Prevent overall API abuse
Scope: Per API key across all endpoints
Implementation: Redis-based distributed counters

Per-Endpoint Limits

Purpose: Protect expensive operations
Scope: Specific endpoints (e.g., value estimates)
Implementation: Endpoint-specific token buckets

Dynamic Rate Limiting

RATE_LIMITS = {
    "properties": "60/minute",          # Standard property searches
    "value_estimate": "30/minute",      # Expensive AI/ML operations
    "rent_estimate": "30/minute",       # Expensive AI/ML operations
    "market_stats": "20/minute",        # Computationally intensive
    "listings_sale": "100/minute",      # Less expensive, higher volume
    "listings_rental": "100/minute",    # Less expensive, higher volume
    "comparables": "40/minute"          # Moderate complexity
}

📊 Database Schema Design

Core Tables

API Keys Management

CREATE TABLE api_keys (
    id SERIAL PRIMARY KEY,
    key_name VARCHAR(100) UNIQUE NOT NULL,
    key_hash VARCHAR(64) UNIQUE NOT NULL,  -- SHA-256 hash
    is_active BOOLEAN DEFAULT TRUE,
    daily_limit INTEGER DEFAULT 1000,
    monthly_limit INTEGER DEFAULT 30000,
    daily_usage INTEGER DEFAULT 0,
    monthly_usage INTEGER DEFAULT 0,
    last_daily_reset TIMESTAMP WITH TIME ZONE DEFAULT NOW(),
    last_monthly_reset TIMESTAMP WITH TIME ZONE DEFAULT NOW(),
    created_at TIMESTAMP WITH TIME ZONE DEFAULT NOW(),
    expires_at TIMESTAMP WITH TIME ZONE,
    last_used TIMESTAMP WITH TIME ZONE
);

Usage Analytics

CREATE TABLE usage_stats (
    id SERIAL PRIMARY KEY,
    api_key_id INTEGER REFERENCES api_keys(id),
    endpoint VARCHAR(50) NOT NULL,
    method VARCHAR(10) NOT NULL,
    status_code INTEGER NOT NULL,
    response_time_ms DECIMAL(10,2) NOT NULL,
    cache_hit BOOLEAN NOT NULL,
    estimated_cost DECIMAL(10,2) DEFAULT 0.0,
    user_agent TEXT,
    ip_address INET,
    created_at TIMESTAMP WITH TIME ZONE DEFAULT NOW()
);

Rate Limiting State

CREATE TABLE rate_limits (
    id SERIAL PRIMARY KEY,
    api_key_id INTEGER REFERENCES api_keys(id),
    endpoint VARCHAR(50) NOT NULL,
    current_tokens INTEGER DEFAULT 0,
    last_refill TIMESTAMP WITH TIME ZONE DEFAULT NOW(),
    created_at TIMESTAMP WITH TIME ZONE DEFAULT NOW(),
    UNIQUE(api_key_id, endpoint)
);

Indexing Strategy

-- Performance indexes
CREATE INDEX idx_cache_entries_key ON cache_entries(cache_key);
CREATE INDEX idx_cache_entries_endpoint_expires ON cache_entries(endpoint, expires_at);
CREATE INDEX idx_cache_entries_created_at ON cache_entries(created_at);

CREATE INDEX idx_usage_stats_api_key_created ON usage_stats(api_key_id, created_at);
CREATE INDEX idx_usage_stats_endpoint_created ON usage_stats(endpoint, created_at);
CREATE INDEX idx_usage_stats_cache_hit ON usage_stats(cache_hit, created_at);

CREATE INDEX idx_api_keys_hash ON api_keys(key_hash);
CREATE INDEX idx_api_keys_active ON api_keys(is_active);

🔒 Security Architecture

Authentication Flow

graph LR
    Client --> |Bearer Token| Auth[Auth Middleware]
    Auth --> Hash[SHA-256 Hash]
    Hash --> DB[(Database Lookup)]
    DB --> Validate[Validate Expiry & Status]
    Validate --> |Valid| Allow[Allow Request]
    Validate --> |Invalid| Deny[401 Unauthorized]

Security Measures

API Key Protection

Storage: Only SHA-256 hashes stored, never plaintext
Transmission: HTTPS only, bearer token format
Rotation: Configurable expiration dates
Revocation: Instant deactivation capability

Network Security

HTTPS Enforcement: Automatic SSL with Caddy
CORS Configuration: Configurable origin restrictions
Rate Limiting: DDoS and abuse protection
Request Validation: Comprehensive input sanitization

Container Security

Non-root User: Containers run as unprivileged user
Minimal Images: Alpine Linux base images
Secret Management: Environment variable injection
Network Isolation: Docker network segregation

📈 Performance Optimizations

Application Level

Async Architecture

# Concurrent request handling
async def handle_multiple_requests():
    tasks = [
        process_request(req1),
        process_request(req2),
        process_request(req3)
    ]
    results = await asyncio.gather(*tasks)
    return results

Connection Pooling

# HTTP client configuration
http_client = httpx.AsyncClient(
    timeout=30.0,
    limits=httpx.Limits(
        max_connections=100,
        max_keepalive_connections=20
    )
)

# Database connection pooling
engine = create_async_engine(
    DATABASE_URL,
    pool_size=20,
    max_overflow=30,
    pool_pre_ping=True,
    pool_recycle=3600
)

Response Optimization

GZip Compression: Automatic response compression
JSON Streaming: Large response streaming
Conditional Requests: ETag and If-Modified-Since support

Database Level

Query Optimization

-- Efficient cache lookup
EXPLAIN ANALYZE
SELECT response_data, expires_at, is_valid
FROM cache_entries
WHERE cache_key = $1 
  AND expires_at > NOW() 
  AND is_valid = TRUE;

Connection Management

Prepared Statements: Reduced parsing overhead
Connection Pooling: Shared connection resources
Read Replicas: Separate analytics queries

Caching Level

Cache Warming Strategies

async def warm_cache():
    """Pre-populate cache with common requests"""
    common_requests = [
        {"endpoint": "properties", "city": "Austin", "state": "TX"},
        {"endpoint": "properties", "city": "Dallas", "state": "TX"},
        {"endpoint": "market_stats", "zipCode": "78701"}
    ]
    
    for request in common_requests:
        await fetch_and_cache(request)

Memory Management

TTL Optimization: Balanced freshness vs. efficiency
Compression: Response data compression
Eviction Policies: Smart cache replacement

📊 Monitoring and Observability

Metrics Collection

Business Metrics

Cache hit ratios by endpoint
API cost savings
Request volume trends
Error rates and patterns

System Metrics

Response time percentiles
Database query performance
Memory and CPU utilization
Connection pool statistics

Custom Metrics

# Prometheus-style metrics
cache_hit_ratio = Gauge('cache_hit_ratio', 'Cache hit ratio by endpoint', ['endpoint'])
api_request_duration = Histogram('api_request_duration_seconds', 'API request duration')
upstream_calls = Counter('upstream_api_calls_total', 'Total upstream API calls')

Health Checks

Application Health

async def health_check():
    checks = {
        "database": await check_database_connection(),
        "cache": await check_cache_availability(),
        "upstream": await check_upstream_api(),
        "disk_space": await check_disk_usage()
    }
    
    overall_status = "healthy" if all(checks.values()) else "unhealthy"
    return {"status": overall_status, "checks": checks}

Dependency Health

Database connectivity and performance
Redis availability and memory usage
Upstream API response times
Disk space and system resources

🔧 Configuration Management

Environment-Based Configuration

class Settings(BaseSettings):
    # Server
    host: str = "0.0.0.0"
    port: int = 8000
    debug: bool = False
    
    # Database
    database_url: str
    database_echo: bool = False
    
    # Cache
    redis_url: Optional[str] = None
    redis_enabled: bool = False
    default_cache_ttl: int = 3600
    
    # Rate Limiting
    enable_rate_limiting: bool = True
    global_rate_limit: str = "1000/hour"
    
    class Config:
        env_file = ".env"
        case_sensitive = False

Feature Flags

class FeatureFlags:
    ENABLE_REDIS_CACHE = os.getenv("ENABLE_REDIS_CACHE", "true").lower() == "true"
    ENABLE_ANALYTICS = os.getenv("ENABLE_ANALYTICS", "true").lower() == "true"
    ENABLE_CACHE_WARMING = os.getenv("ENABLE_CACHE_WARMING", "false").lower() == "true"
    STRICT_RATE_LIMITING = os.getenv("STRICT_RATE_LIMITING", "false").lower() == "true"

🚀 Scalability Considerations

Horizontal Scaling

Stateless Design

No server-side sessions
Shared state in Redis/Database
Load balancer friendly

Container Orchestration

# Kubernetes deployment example
apiVersion: apps/v1
kind: Deployment
metadata:
  name: rentcache
spec:
  replicas: 3
  selector:
    matchLabels:
      app: rentcache
  template:
    spec:
      containers:
      - name: rentcache
        image: rentcache:latest
        resources:
          requests:
            memory: "512Mi"
            cpu: "250m"
          limits:
            memory: "1Gi"
            cpu: "500m"

Vertical Scaling

Resource Optimization

Memory: Cache size tuning
CPU: Async I/O optimization
Storage: Database indexing and partitioning
Network: Connection pooling and keep-alive

Data Partitioning

Database Sharding

-- Partition by date for analytics
CREATE TABLE usage_stats (
    -- columns
) PARTITION BY RANGE (created_at);

CREATE TABLE usage_stats_2024_01 PARTITION OF usage_stats
FOR VALUES FROM ('2024-01-01') TO ('2024-02-01');

Cache Distribution

Redis Cluster for distributed caching
Consistent hashing for cache key distribution
Regional cache replication

🔄 Disaster Recovery

Backup Strategy

Database Backups

# Automated daily backups
pg_dump rentcache | gzip > backup-$(date +%Y%m%d).sql.gz

# Point-in-time recovery
pg_basebackup -D /backup/base -Ft -z -P

Configuration Backups

Environment variables
Docker Compose files
SSL certificates
Application configuration

Recovery Procedures

Database Recovery

# Restore from backup
gunzip -c backup-20240115.sql.gz | psql rentcache

# Point-in-time recovery
pg_ctl stop -D /var/lib/postgresql/data
rm -rf /var/lib/postgresql/data/*
pg_basebackup -D /var/lib/postgresql/data -R

Cache Recovery

Redis persistence (RDB + AOF)
Cache warming from database
Graceful degradation to upstream API

This architecture is designed for high availability, performance, and cost optimization while maintaining security and operational simplicity. For implementation details, see the Deployment Guide and Usage Guide.

18 KiB Raw Blame History

Architecture Overview

🏗️ System Architecture

High-Level Architecture

Component Responsibilities

FastAPI Application Server

Authentication & Authorization

Multi-Level Caching System

Rate Limiting Engine

Usage Analytics

🔄 Request Flow Architecture

1. Request Processing Pipeline

2. Cache Key Generation

💾 Caching Strategy

Multi-Level Cache Architecture

Level 1: Redis Cache (Hot Data)

Level 2: Database Cache (Persistent)

Cache TTL Strategy

Stale-While-Revalidate Pattern

🚦 Rate Limiting Implementation

Token Bucket Algorithm

Multi-Tier Rate Limiting

Global Limits

Per-Endpoint Limits

Dynamic Rate Limiting

📊 Database Schema Design

Core Tables

API Keys Management

Usage Analytics

Rate Limiting State

Indexing Strategy

🔒 Security Architecture

Authentication Flow

Security Measures

API Key Protection

Network Security

Container Security

📈 Performance Optimizations

Application Level

Async Architecture

Connection Pooling

Response Optimization

Database Level

Query Optimization

Connection Management

Caching Level

Cache Warming Strategies

Memory Management

📊 Monitoring and Observability

Metrics Collection

Business Metrics

System Metrics

Custom Metrics

Health Checks

Application Health

Dependency Health

🔧 Configuration Management

Environment-Based Configuration

Feature Flags

🚀 Scalability Considerations

Horizontal Scaling

Stateless Design

Container Orchestration

Vertical Scaling

Resource Optimization

Data Partitioning

Database Sharding

Cache Distribution

🔄 Disaster Recovery

Backup Strategy

Database Backups

Configuration Backups

Recovery Procedures

Database Recovery

Cache Recovery

18 KiB

Raw Blame History