rentcache/docs/ARCHITECTURE.md

# Architecture Overview

This document provides a comprehensive overview of RentCache's system architecture, design decisions, and technical implementation details.

## 🏗️ System Architecture

### High-Level Architecture

```mermaid
graph TB
    Client[Client Applications] --> LB[Load Balancer/Caddy]
    LB --> App[RentCache FastAPI]

    App --> Auth[Authentication Layer]
    Auth --> Rate[Rate Limiting]
    Rate --> Cache[Cache Manager]

    Cache --> L1[L1 Cache<br/>Redis]
    Cache --> L2[L2 Cache<br/>SQLite/PostgreSQL]

    Cache --> API[Rentcast API]

    App --> Analytics[Usage Analytics]
    Analytics --> DB[(Database)]

    App --> Monitor[Health Monitoring]
    App --> Metrics[Metrics Collection]

    subgraph "Data Layer"
        L1
        L2
        DB
    end

    subgraph "External Services"
        API
    end
```

### Component Responsibilities

#### **FastAPI Application Server**
- **Primary Role**: HTTP request handling and API routing
- **Key Features**:
  - Async/await architecture for high concurrency
  - OpenAPI documentation generation
  - Request/response validation with Pydantic
  - Middleware stack for cross-cutting concerns

#### **Authentication & Authorization**
- **Method**: Bearer token authentication using SHA-256 hashed API keys
- **Storage**: Secure key storage with expiration and usage limits
- **Features**: Per-key rate limiting and usage tracking

#### **Multi-Level Caching System**
- **L1 Cache (Redis)**: In-memory cache for ultra-fast access
- **L2 Cache (Database)**: Persistent cache with analytics
- **Strategy**: Write-through with intelligent TTL management

#### **Rate Limiting Engine**
- **Implementation**: Token bucket algorithm with sliding windows
- **Granularity**: Global and per-endpoint limits
- **Backend**: Redis-based distributed rate limiting

#### **Usage Analytics**
- **Tracking**: Request patterns, costs, and performance metrics
- **Storage**: Time-series data in relational database
- **Reporting**: Real-time dashboards and historical analysis

## 🔄 Request Flow Architecture

### 1. Request Processing Pipeline

```mermaid
sequenceDiagram
    participant C as Client
    participant A as Auth Layer
    participant R as Rate Limiter
    participant CM as Cache Manager
    participant RC as Redis Cache
    participant DB as Database
    participant RA as Rentcast API

    C->>A: HTTP Request + API Key
    A->>A: Validate & Hash Key

    alt Valid API Key
        A->>R: Check Rate Limits
        alt Within Limits
            R->>CM: Cache Lookup
            CM->>RC: Check L1 Cache

            alt Cache Hit (L1)
                RC-->>CM: Return Cached Data
                CM-->>C: Response + Cache Headers
            else Cache Miss (L1)
                CM->>DB: Check L2 Cache
                alt Cache Hit (L2)
                    DB-->>CM: Return Cached Data
                    CM->>RC: Populate L1
                    CM-->>C: Response + Cache Headers
                else Cache Miss (L2)
                    CM->>RA: Upstream API Call
                    RA-->>CM: API Response
                    CM->>DB: Store in L2
                    CM->>RC: Store in L1
                    CM-->>C: Response + Cost Headers
                end
            end
        else Rate Limited
            R-->>C: 429 Rate Limit Exceeded
        end
    else Invalid API Key
        A-->>C: 401 Unauthorized
    end
```

### 2. Cache Key Generation

**Cache Key Strategy**: MD5 hash of request signature
```python
cache_key = md5(json.dumps({
    "endpoint": "properties",
    "method": "GET",
    "path_params": {"property_id": "123"},
    "query_params": {"city": "Austin", "state": "TX"},
    "body": {}
}, sort_keys=True)).hexdigest()
```

**Benefits**:
- Deterministic cache keys
- Collision resistance
- Parameter order independence
- Efficient storage and lookup

## 💾 Caching Strategy

### Multi-Level Cache Architecture

#### **Level 1: Redis Cache (Hot Data)**
- **Purpose**: Ultra-fast access to frequently requested data
- **TTL**: 30 minutes to 2 hours
- **Eviction**: LRU (Least Recently Used)
- **Size**: Memory-limited, optimized for speed

```python
# L1 Cache Configuration
REDIS_CONFIG = {
    "maxmemory": "512mb",
    "maxmemory_policy": "allkeys-lru",
    "save": ["900 1", "300 10", "60 10000"],  # Persistence snapshots
    "appendonly": True,  # AOF for durability
    "appendfsync": "everysec"
}
```

#### **Level 2: Database Cache (Persistent)**
- **Purpose**: Persistent cache with analytics and soft deletion
- **TTL**: 1 hour to 48 hours based on endpoint volatility
- **Storage**: Full response data + metadata
- **Features**: Soft deletion, usage tracking, cost analytics

```sql
-- Cache Entry Schema
CREATE TABLE cache_entries (
    id SERIAL PRIMARY KEY,
    cache_key VARCHAR(64) UNIQUE NOT NULL,
    endpoint VARCHAR(50) NOT NULL,
    method VARCHAR(10) NOT NULL,
    params_hash VARCHAR(64) NOT NULL,
    response_data JSONB NOT NULL,
    status_code INTEGER NOT NULL,
    estimated_cost DECIMAL(10,2) DEFAULT 0.0,
    created_at TIMESTAMP WITH TIME ZONE DEFAULT NOW(),
    expires_at TIMESTAMP WITH TIME ZONE NOT NULL,
    is_valid BOOLEAN DEFAULT TRUE,
    hit_count INTEGER DEFAULT 0,
    last_accessed TIMESTAMP WITH TIME ZONE DEFAULT NOW()
);
```

### Cache TTL Strategy

| Endpoint Type | Data Volatility | Default TTL | Rationale |
|---------------|----------------|-------------|-----------|
| **Property Records** | Very Low | 24 hours | Property characteristics rarely change |
| **Value Estimates** | Medium | 1 hour | Market fluctuations affect valuations |
| **Rent Estimates** | Medium | 1 hour | Rental markets change regularly |
| **Listings** | High | 30 minutes | Active market with frequent updates |
| **Market Statistics** | Low | 2 hours | Aggregated data changes slowly |
| **Comparables** | Medium | 1 hour | Market-dependent analysis |

### Stale-While-Revalidate Pattern

```python
async def get_with_stale_while_revalidate(cache_key: str, ttl: int):
    """
    Serve stale data immediately while refreshing in background
    """
    cached_data = await cache.get(cache_key)

    if cached_data:
        if not cached_data.is_expired:
            return cached_data  # Fresh data
        else:
            # Serve stale data, trigger background refresh
            asyncio.create_task(refresh_cache_entry(cache_key))
            return cached_data  # Stale but usable

    # Cache miss - fetch fresh data
    return await fetch_and_cache(cache_key, ttl)
```

**Benefits**:
- Improved user experience (no waiting for fresh data)
- Reduced upstream API calls during traffic spikes
- Graceful handling of upstream service issues

## 🚦 Rate Limiting Implementation

### Token Bucket Algorithm

```python
class TokenBucket:
    def __init__(self, capacity: int, refill_rate: float):
        self.capacity = capacity
        self.tokens = capacity
        self.refill_rate = refill_rate  # tokens per second
        self.last_refill = time.time()

    async def consume(self, tokens: int = 1) -> bool:
        await self._refill()
        if self.tokens >= tokens:
            self.tokens -= tokens
            return True
        return False

    async def _refill(self):
        now = time.time()
        tokens_to_add = (now - self.last_refill) * self.refill_rate
        self.tokens = min(self.capacity, self.tokens + tokens_to_add)
        self.last_refill = now
```

### Multi-Tier Rate Limiting

#### **Global Limits**
- **Purpose**: Prevent overall API abuse
- **Scope**: Per API key across all endpoints
- **Implementation**: Redis-based distributed counters

#### **Per-Endpoint Limits**
- **Purpose**: Protect expensive operations
- **Scope**: Specific endpoints (e.g., value estimates)
- **Implementation**: Endpoint-specific token buckets

#### **Dynamic Rate Limiting**
```python
RATE_LIMITS = {
    "properties": "60/minute",          # Standard property searches
    "value_estimate": "30/minute",      # Expensive AI/ML operations
    "rent_estimate": "30/minute",       # Expensive AI/ML operations
    "market_stats": "20/minute",        # Computationally intensive
    "listings_sale": "100/minute",      # Less expensive, higher volume
    "listings_rental": "100/minute",    # Less expensive, higher volume
    "comparables": "40/minute"          # Moderate complexity
}
```

## 📊 Database Schema Design

### Core Tables

#### **API Keys Management**
```sql
CREATE TABLE api_keys (
    id SERIAL PRIMARY KEY,
    key_name VARCHAR(100) UNIQUE NOT NULL,
    key_hash VARCHAR(64) UNIQUE NOT NULL,  -- SHA-256 hash
    is_active BOOLEAN DEFAULT TRUE,
    daily_limit INTEGER DEFAULT 1000,
    monthly_limit INTEGER DEFAULT 30000,
    daily_usage INTEGER DEFAULT 0,
    monthly_usage INTEGER DEFAULT 0,
    last_daily_reset TIMESTAMP WITH TIME ZONE DEFAULT NOW(),
    last_monthly_reset TIMESTAMP WITH TIME ZONE DEFAULT NOW(),
    created_at TIMESTAMP WITH TIME ZONE DEFAULT NOW(),
    expires_at TIMESTAMP WITH TIME ZONE,
    last_used TIMESTAMP WITH TIME ZONE
);
```

#### **Usage Analytics**
```sql
CREATE TABLE usage_stats (
    id SERIAL PRIMARY KEY,
    api_key_id INTEGER REFERENCES api_keys(id),
    endpoint VARCHAR(50) NOT NULL,
    method VARCHAR(10) NOT NULL,
    status_code INTEGER NOT NULL,
    response_time_ms DECIMAL(10,2) NOT NULL,
    cache_hit BOOLEAN NOT NULL,
    estimated_cost DECIMAL(10,2) DEFAULT 0.0,
    user_agent TEXT,
    ip_address INET,
    created_at TIMESTAMP WITH TIME ZONE DEFAULT NOW()
);
```

#### **Rate Limiting State**
```sql
CREATE TABLE rate_limits (
    id SERIAL PRIMARY KEY,
    api_key_id INTEGER REFERENCES api_keys(id),
    endpoint VARCHAR(50) NOT NULL,
    current_tokens INTEGER DEFAULT 0,
    last_refill TIMESTAMP WITH TIME ZONE DEFAULT NOW(),
    created_at TIMESTAMP WITH TIME ZONE DEFAULT NOW(),
    UNIQUE(api_key_id, endpoint)
);
```

### Indexing Strategy

```sql
-- Performance indexes
CREATE INDEX idx_cache_entries_key ON cache_entries(cache_key);
CREATE INDEX idx_cache_entries_endpoint_expires ON cache_entries(endpoint, expires_at);
CREATE INDEX idx_cache_entries_created_at ON cache_entries(created_at);

CREATE INDEX idx_usage_stats_api_key_created ON usage_stats(api_key_id, created_at);
CREATE INDEX idx_usage_stats_endpoint_created ON usage_stats(endpoint, created_at);
CREATE INDEX idx_usage_stats_cache_hit ON usage_stats(cache_hit, created_at);

CREATE INDEX idx_api_keys_hash ON api_keys(key_hash);
CREATE INDEX idx_api_keys_active ON api_keys(is_active);
```

## 🔒 Security Architecture

### Authentication Flow

```mermaid
graph LR
    Client --> |Bearer Token| Auth[Auth Middleware]
    Auth --> Hash[SHA-256 Hash]
    Hash --> DB[(Database Lookup)]
    DB --> Validate[Validate Expiry & Status]
    Validate --> |Valid| Allow[Allow Request]
    Validate --> |Invalid| Deny[401 Unauthorized]
```

### Security Measures

#### **API Key Protection**
- **Storage**: Only SHA-256 hashes stored, never plaintext
- **Transmission**: HTTPS only, bearer token format
- **Rotation**: Configurable expiration dates
- **Revocation**: Instant deactivation capability

#### **Network Security**
- **HTTPS Enforcement**: Automatic SSL with Caddy
- **CORS Configuration**: Configurable origin restrictions
- **Rate Limiting**: DDoS and abuse protection
- **Request Validation**: Comprehensive input sanitization

#### **Container Security**
- **Non-root User**: Containers run as unprivileged user
- **Minimal Images**: Alpine Linux base images
- **Secret Management**: Environment variable injection
- **Network Isolation**: Docker network segregation

## 📈 Performance Optimizations

### Application Level

#### **Async Architecture**
```python
# Concurrent request handling
async def handle_multiple_requests():
    tasks = [
        process_request(req1),
        process_request(req2),
        process_request(req3)
    ]
    results = await asyncio.gather(*tasks)
    return results
```

#### **Connection Pooling**
```python
# HTTP client configuration
http_client = httpx.AsyncClient(
    timeout=30.0,
    limits=httpx.Limits(
        max_connections=100,
        max_keepalive_connections=20
    )
)

# Database connection pooling
engine = create_async_engine(
    DATABASE_URL,
    pool_size=20,
    max_overflow=30,
    pool_pre_ping=True,
    pool_recycle=3600
)
```

#### **Response Optimization**
- **GZip Compression**: Automatic response compression
- **JSON Streaming**: Large response streaming
- **Conditional Requests**: ETag and If-Modified-Since support

### Database Level

#### **Query Optimization**
```sql
-- Efficient cache lookup
EXPLAIN ANALYZE
SELECT response_data, expires_at, is_valid
FROM cache_entries
WHERE cache_key = $1
  AND expires_at > NOW()
  AND is_valid = TRUE;
```

#### **Connection Management**
- **Prepared Statements**: Reduced parsing overhead
- **Connection Pooling**: Shared connection resources
- **Read Replicas**: Separate analytics queries

### Caching Level

#### **Cache Warming Strategies**
```python
async def warm_cache():
    """Pre-populate cache with common requests"""
    common_requests = [
        {"endpoint": "properties", "city": "Austin", "state": "TX"},
        {"endpoint": "properties", "city": "Dallas", "state": "TX"},
        {"endpoint": "market_stats", "zipCode": "78701"}
    ]

    for request in common_requests:
        await fetch_and_cache(request)
```

#### **Memory Management**
- **TTL Optimization**: Balanced freshness vs. efficiency
- **Compression**: Response data compression
- **Eviction Policies**: Smart cache replacement

## 📊 Monitoring and Observability

### Metrics Collection

#### **Business Metrics**
- Cache hit ratios by endpoint
- API cost savings
- Request volume trends
- Error rates and patterns

#### **System Metrics**
- Response time percentiles
- Database query performance
- Memory and CPU utilization
- Connection pool statistics

#### **Custom Metrics**
```python
# Prometheus-style metrics
cache_hit_ratio = Gauge('cache_hit_ratio', 'Cache hit ratio by endpoint', ['endpoint'])
api_request_duration = Histogram('api_request_duration_seconds', 'API request duration')
upstream_calls = Counter('upstream_api_calls_total', 'Total upstream API calls')
```

### Health Checks

#### **Application Health**
```python
async def health_check():
    checks = {
        "database": await check_database_connection(),
        "cache": await check_cache_availability(),
        "upstream": await check_upstream_api(),
        "disk_space": await check_disk_usage()
    }

    overall_status = "healthy" if all(checks.values()) else "unhealthy"
    return {"status": overall_status, "checks": checks}
```

#### **Dependency Health**
- Database connectivity and performance
- Redis availability and memory usage
- Upstream API response times
- Disk space and system resources

## 🔧 Configuration Management

### Environment-Based Configuration

```python
class Settings(BaseSettings):
    # Server
    host: str = "0.0.0.0"
    port: int = 8000
    debug: bool = False

    # Database
    database_url: str
    database_echo: bool = False

    # Cache
    redis_url: Optional[str] = None
    redis_enabled: bool = False
    default_cache_ttl: int = 3600

    # Rate Limiting
    enable_rate_limiting: bool = True
    global_rate_limit: str = "1000/hour"

    class Config:
        env_file = ".env"
        case_sensitive = False
```

### Feature Flags

```python
class FeatureFlags:
    ENABLE_REDIS_CACHE = os.getenv("ENABLE_REDIS_CACHE", "true").lower() == "true"
    ENABLE_ANALYTICS = os.getenv("ENABLE_ANALYTICS", "true").lower() == "true"
    ENABLE_CACHE_WARMING = os.getenv("ENABLE_CACHE_WARMING", "false").lower() == "true"
    STRICT_RATE_LIMITING = os.getenv("STRICT_RATE_LIMITING", "false").lower() == "true"
```

## 🚀 Scalability Considerations

### Horizontal Scaling

#### **Stateless Design**
- No server-side sessions
- Shared state in Redis/Database
- Load balancer friendly

#### **Container Orchestration**
```yaml
# Kubernetes deployment example
apiVersion: apps/v1
kind: Deployment
metadata:
  name: rentcache
spec:
  replicas: 3
  selector:
    matchLabels:
      app: rentcache
  template:
    spec:
      containers:
      - name: rentcache
        image: rentcache:latest
        resources:
          requests:
            memory: "512Mi"
            cpu: "250m"
          limits:
            memory: "1Gi"
            cpu: "500m"
```

### Vertical Scaling

#### **Resource Optimization**
- Memory: Cache size tuning
- CPU: Async I/O optimization
- Storage: Database indexing and partitioning
- Network: Connection pooling and keep-alive

### Data Partitioning

#### **Database Sharding**
```sql
-- Partition by date for analytics
CREATE TABLE usage_stats (
    -- columns
) PARTITION BY RANGE (created_at);

CREATE TABLE usage_stats_2024_01 PARTITION OF usage_stats
FOR VALUES FROM ('2024-01-01') TO ('2024-02-01');
```

#### **Cache Distribution**
- Redis Cluster for distributed caching
- Consistent hashing for cache key distribution
- Regional cache replication

## 🔄 Disaster Recovery

### Backup Strategy

#### **Database Backups**
```bash
# Automated daily backups
pg_dump rentcache | gzip > backup-$(date +%Y%m%d).sql.gz

# Point-in-time recovery
pg_basebackup -D /backup/base -Ft -z -P
```

#### **Configuration Backups**
- Environment variables
- Docker Compose files
- SSL certificates
- Application configuration

### Recovery Procedures

#### **Database Recovery**
```bash
# Restore from backup
gunzip -c backup-20240115.sql.gz | psql rentcache

# Point-in-time recovery
pg_ctl stop -D /var/lib/postgresql/data
rm -rf /var/lib/postgresql/data/*
pg_basebackup -D /var/lib/postgresql/data -R
```

#### **Cache Recovery**
- Redis persistence (RDB + AOF)
- Cache warming from database
- Graceful degradation to upstream API

---

This architecture is designed for high availability, performance, and cost optimization while maintaining security and operational simplicity. For implementation details, see the [Deployment Guide](DEPLOYMENT.md) and [Usage Guide](USAGE.md).