Documentation created: - Updated README.md with Docker deployment and production domain - docs/DEPLOYMENT.md: Complete Docker/Caddy deployment guide - docs/ARCHITECTURE.md: System design and caching strategies - docs/QUICKSTART.md: 5-minute setup with real examples - Updated USAGE.md with production URLs Key updates: - Production domain: https://rentcache.l.supported.systems - Docker Compose with Caddy reverse proxy - Make commands for easy management - Cost savings examples (70-90% reduction) - Complete architecture documentation - Production deployment checklist - Monitoring and maintenance guides The system is now fully documented for production deployment.
18 KiB
18 KiB
Architecture Overview
This document provides a comprehensive overview of RentCache's system architecture, design decisions, and technical implementation details.
🏗️ System Architecture
High-Level Architecture
graph TB
Client[Client Applications] --> LB[Load Balancer/Caddy]
LB --> App[RentCache FastAPI]
App --> Auth[Authentication Layer]
Auth --> Rate[Rate Limiting]
Rate --> Cache[Cache Manager]
Cache --> L1[L1 Cache<br/>Redis]
Cache --> L2[L2 Cache<br/>SQLite/PostgreSQL]
Cache --> API[Rentcast API]
App --> Analytics[Usage Analytics]
Analytics --> DB[(Database)]
App --> Monitor[Health Monitoring]
App --> Metrics[Metrics Collection]
subgraph "Data Layer"
L1
L2
DB
end
subgraph "External Services"
API
end
Component Responsibilities
FastAPI Application Server
- Primary Role: HTTP request handling and API routing
- Key Features:
- Async/await architecture for high concurrency
- OpenAPI documentation generation
- Request/response validation with Pydantic
- Middleware stack for cross-cutting concerns
Authentication & Authorization
- Method: Bearer token authentication using SHA-256 hashed API keys
- Storage: Secure key storage with expiration and usage limits
- Features: Per-key rate limiting and usage tracking
Multi-Level Caching System
- L1 Cache (Redis): In-memory cache for ultra-fast access
- L2 Cache (Database): Persistent cache with analytics
- Strategy: Write-through with intelligent TTL management
Rate Limiting Engine
- Implementation: Token bucket algorithm with sliding windows
- Granularity: Global and per-endpoint limits
- Backend: Redis-based distributed rate limiting
Usage Analytics
- Tracking: Request patterns, costs, and performance metrics
- Storage: Time-series data in relational database
- Reporting: Real-time dashboards and historical analysis
🔄 Request Flow Architecture
1. Request Processing Pipeline
sequenceDiagram
participant C as Client
participant A as Auth Layer
participant R as Rate Limiter
participant CM as Cache Manager
participant RC as Redis Cache
participant DB as Database
participant RA as Rentcast API
C->>A: HTTP Request + API Key
A->>A: Validate & Hash Key
alt Valid API Key
A->>R: Check Rate Limits
alt Within Limits
R->>CM: Cache Lookup
CM->>RC: Check L1 Cache
alt Cache Hit (L1)
RC-->>CM: Return Cached Data
CM-->>C: Response + Cache Headers
else Cache Miss (L1)
CM->>DB: Check L2 Cache
alt Cache Hit (L2)
DB-->>CM: Return Cached Data
CM->>RC: Populate L1
CM-->>C: Response + Cache Headers
else Cache Miss (L2)
CM->>RA: Upstream API Call
RA-->>CM: API Response
CM->>DB: Store in L2
CM->>RC: Store in L1
CM-->>C: Response + Cost Headers
end
end
else Rate Limited
R-->>C: 429 Rate Limit Exceeded
end
else Invalid API Key
A-->>C: 401 Unauthorized
end
2. Cache Key Generation
Cache Key Strategy: MD5 hash of request signature
cache_key = md5(json.dumps({
"endpoint": "properties",
"method": "GET",
"path_params": {"property_id": "123"},
"query_params": {"city": "Austin", "state": "TX"},
"body": {}
}, sort_keys=True)).hexdigest()
Benefits:
- Deterministic cache keys
- Collision resistance
- Parameter order independence
- Efficient storage and lookup
💾 Caching Strategy
Multi-Level Cache Architecture
Level 1: Redis Cache (Hot Data)
- Purpose: Ultra-fast access to frequently requested data
- TTL: 30 minutes to 2 hours
- Eviction: LRU (Least Recently Used)
- Size: Memory-limited, optimized for speed
# L1 Cache Configuration
REDIS_CONFIG = {
"maxmemory": "512mb",
"maxmemory_policy": "allkeys-lru",
"save": ["900 1", "300 10", "60 10000"], # Persistence snapshots
"appendonly": True, # AOF for durability
"appendfsync": "everysec"
}
Level 2: Database Cache (Persistent)
- Purpose: Persistent cache with analytics and soft deletion
- TTL: 1 hour to 48 hours based on endpoint volatility
- Storage: Full response data + metadata
- Features: Soft deletion, usage tracking, cost analytics
-- Cache Entry Schema
CREATE TABLE cache_entries (
id SERIAL PRIMARY KEY,
cache_key VARCHAR(64) UNIQUE NOT NULL,
endpoint VARCHAR(50) NOT NULL,
method VARCHAR(10) NOT NULL,
params_hash VARCHAR(64) NOT NULL,
response_data JSONB NOT NULL,
status_code INTEGER NOT NULL,
estimated_cost DECIMAL(10,2) DEFAULT 0.0,
created_at TIMESTAMP WITH TIME ZONE DEFAULT NOW(),
expires_at TIMESTAMP WITH TIME ZONE NOT NULL,
is_valid BOOLEAN DEFAULT TRUE,
hit_count INTEGER DEFAULT 0,
last_accessed TIMESTAMP WITH TIME ZONE DEFAULT NOW()
);
Cache TTL Strategy
Endpoint Type | Data Volatility | Default TTL | Rationale |
---|---|---|---|
Property Records | Very Low | 24 hours | Property characteristics rarely change |
Value Estimates | Medium | 1 hour | Market fluctuations affect valuations |
Rent Estimates | Medium | 1 hour | Rental markets change regularly |
Listings | High | 30 minutes | Active market with frequent updates |
Market Statistics | Low | 2 hours | Aggregated data changes slowly |
Comparables | Medium | 1 hour | Market-dependent analysis |
Stale-While-Revalidate Pattern
async def get_with_stale_while_revalidate(cache_key: str, ttl: int):
"""
Serve stale data immediately while refreshing in background
"""
cached_data = await cache.get(cache_key)
if cached_data:
if not cached_data.is_expired:
return cached_data # Fresh data
else:
# Serve stale data, trigger background refresh
asyncio.create_task(refresh_cache_entry(cache_key))
return cached_data # Stale but usable
# Cache miss - fetch fresh data
return await fetch_and_cache(cache_key, ttl)
Benefits:
- Improved user experience (no waiting for fresh data)
- Reduced upstream API calls during traffic spikes
- Graceful handling of upstream service issues
🚦 Rate Limiting Implementation
Token Bucket Algorithm
class TokenBucket:
def __init__(self, capacity: int, refill_rate: float):
self.capacity = capacity
self.tokens = capacity
self.refill_rate = refill_rate # tokens per second
self.last_refill = time.time()
async def consume(self, tokens: int = 1) -> bool:
await self._refill()
if self.tokens >= tokens:
self.tokens -= tokens
return True
return False
async def _refill(self):
now = time.time()
tokens_to_add = (now - self.last_refill) * self.refill_rate
self.tokens = min(self.capacity, self.tokens + tokens_to_add)
self.last_refill = now
Multi-Tier Rate Limiting
Global Limits
- Purpose: Prevent overall API abuse
- Scope: Per API key across all endpoints
- Implementation: Redis-based distributed counters
Per-Endpoint Limits
- Purpose: Protect expensive operations
- Scope: Specific endpoints (e.g., value estimates)
- Implementation: Endpoint-specific token buckets
Dynamic Rate Limiting
RATE_LIMITS = {
"properties": "60/minute", # Standard property searches
"value_estimate": "30/minute", # Expensive AI/ML operations
"rent_estimate": "30/minute", # Expensive AI/ML operations
"market_stats": "20/minute", # Computationally intensive
"listings_sale": "100/minute", # Less expensive, higher volume
"listings_rental": "100/minute", # Less expensive, higher volume
"comparables": "40/minute" # Moderate complexity
}
📊 Database Schema Design
Core Tables
API Keys Management
CREATE TABLE api_keys (
id SERIAL PRIMARY KEY,
key_name VARCHAR(100) UNIQUE NOT NULL,
key_hash VARCHAR(64) UNIQUE NOT NULL, -- SHA-256 hash
is_active BOOLEAN DEFAULT TRUE,
daily_limit INTEGER DEFAULT 1000,
monthly_limit INTEGER DEFAULT 30000,
daily_usage INTEGER DEFAULT 0,
monthly_usage INTEGER DEFAULT 0,
last_daily_reset TIMESTAMP WITH TIME ZONE DEFAULT NOW(),
last_monthly_reset TIMESTAMP WITH TIME ZONE DEFAULT NOW(),
created_at TIMESTAMP WITH TIME ZONE DEFAULT NOW(),
expires_at TIMESTAMP WITH TIME ZONE,
last_used TIMESTAMP WITH TIME ZONE
);
Usage Analytics
CREATE TABLE usage_stats (
id SERIAL PRIMARY KEY,
api_key_id INTEGER REFERENCES api_keys(id),
endpoint VARCHAR(50) NOT NULL,
method VARCHAR(10) NOT NULL,
status_code INTEGER NOT NULL,
response_time_ms DECIMAL(10,2) NOT NULL,
cache_hit BOOLEAN NOT NULL,
estimated_cost DECIMAL(10,2) DEFAULT 0.0,
user_agent TEXT,
ip_address INET,
created_at TIMESTAMP WITH TIME ZONE DEFAULT NOW()
);
Rate Limiting State
CREATE TABLE rate_limits (
id SERIAL PRIMARY KEY,
api_key_id INTEGER REFERENCES api_keys(id),
endpoint VARCHAR(50) NOT NULL,
current_tokens INTEGER DEFAULT 0,
last_refill TIMESTAMP WITH TIME ZONE DEFAULT NOW(),
created_at TIMESTAMP WITH TIME ZONE DEFAULT NOW(),
UNIQUE(api_key_id, endpoint)
);
Indexing Strategy
-- Performance indexes
CREATE INDEX idx_cache_entries_key ON cache_entries(cache_key);
CREATE INDEX idx_cache_entries_endpoint_expires ON cache_entries(endpoint, expires_at);
CREATE INDEX idx_cache_entries_created_at ON cache_entries(created_at);
CREATE INDEX idx_usage_stats_api_key_created ON usage_stats(api_key_id, created_at);
CREATE INDEX idx_usage_stats_endpoint_created ON usage_stats(endpoint, created_at);
CREATE INDEX idx_usage_stats_cache_hit ON usage_stats(cache_hit, created_at);
CREATE INDEX idx_api_keys_hash ON api_keys(key_hash);
CREATE INDEX idx_api_keys_active ON api_keys(is_active);
🔒 Security Architecture
Authentication Flow
graph LR
Client --> |Bearer Token| Auth[Auth Middleware]
Auth --> Hash[SHA-256 Hash]
Hash --> DB[(Database Lookup)]
DB --> Validate[Validate Expiry & Status]
Validate --> |Valid| Allow[Allow Request]
Validate --> |Invalid| Deny[401 Unauthorized]
Security Measures
API Key Protection
- Storage: Only SHA-256 hashes stored, never plaintext
- Transmission: HTTPS only, bearer token format
- Rotation: Configurable expiration dates
- Revocation: Instant deactivation capability
Network Security
- HTTPS Enforcement: Automatic SSL with Caddy
- CORS Configuration: Configurable origin restrictions
- Rate Limiting: DDoS and abuse protection
- Request Validation: Comprehensive input sanitization
Container Security
- Non-root User: Containers run as unprivileged user
- Minimal Images: Alpine Linux base images
- Secret Management: Environment variable injection
- Network Isolation: Docker network segregation
📈 Performance Optimizations
Application Level
Async Architecture
# Concurrent request handling
async def handle_multiple_requests():
tasks = [
process_request(req1),
process_request(req2),
process_request(req3)
]
results = await asyncio.gather(*tasks)
return results
Connection Pooling
# HTTP client configuration
http_client = httpx.AsyncClient(
timeout=30.0,
limits=httpx.Limits(
max_connections=100,
max_keepalive_connections=20
)
)
# Database connection pooling
engine = create_async_engine(
DATABASE_URL,
pool_size=20,
max_overflow=30,
pool_pre_ping=True,
pool_recycle=3600
)
Response Optimization
- GZip Compression: Automatic response compression
- JSON Streaming: Large response streaming
- Conditional Requests: ETag and If-Modified-Since support
Database Level
Query Optimization
-- Efficient cache lookup
EXPLAIN ANALYZE
SELECT response_data, expires_at, is_valid
FROM cache_entries
WHERE cache_key = $1
AND expires_at > NOW()
AND is_valid = TRUE;
Connection Management
- Prepared Statements: Reduced parsing overhead
- Connection Pooling: Shared connection resources
- Read Replicas: Separate analytics queries
Caching Level
Cache Warming Strategies
async def warm_cache():
"""Pre-populate cache with common requests"""
common_requests = [
{"endpoint": "properties", "city": "Austin", "state": "TX"},
{"endpoint": "properties", "city": "Dallas", "state": "TX"},
{"endpoint": "market_stats", "zipCode": "78701"}
]
for request in common_requests:
await fetch_and_cache(request)
Memory Management
- TTL Optimization: Balanced freshness vs. efficiency
- Compression: Response data compression
- Eviction Policies: Smart cache replacement
📊 Monitoring and Observability
Metrics Collection
Business Metrics
- Cache hit ratios by endpoint
- API cost savings
- Request volume trends
- Error rates and patterns
System Metrics
- Response time percentiles
- Database query performance
- Memory and CPU utilization
- Connection pool statistics
Custom Metrics
# Prometheus-style metrics
cache_hit_ratio = Gauge('cache_hit_ratio', 'Cache hit ratio by endpoint', ['endpoint'])
api_request_duration = Histogram('api_request_duration_seconds', 'API request duration')
upstream_calls = Counter('upstream_api_calls_total', 'Total upstream API calls')
Health Checks
Application Health
async def health_check():
checks = {
"database": await check_database_connection(),
"cache": await check_cache_availability(),
"upstream": await check_upstream_api(),
"disk_space": await check_disk_usage()
}
overall_status = "healthy" if all(checks.values()) else "unhealthy"
return {"status": overall_status, "checks": checks}
Dependency Health
- Database connectivity and performance
- Redis availability and memory usage
- Upstream API response times
- Disk space and system resources
🔧 Configuration Management
Environment-Based Configuration
class Settings(BaseSettings):
# Server
host: str = "0.0.0.0"
port: int = 8000
debug: bool = False
# Database
database_url: str
database_echo: bool = False
# Cache
redis_url: Optional[str] = None
redis_enabled: bool = False
default_cache_ttl: int = 3600
# Rate Limiting
enable_rate_limiting: bool = True
global_rate_limit: str = "1000/hour"
class Config:
env_file = ".env"
case_sensitive = False
Feature Flags
class FeatureFlags:
ENABLE_REDIS_CACHE = os.getenv("ENABLE_REDIS_CACHE", "true").lower() == "true"
ENABLE_ANALYTICS = os.getenv("ENABLE_ANALYTICS", "true").lower() == "true"
ENABLE_CACHE_WARMING = os.getenv("ENABLE_CACHE_WARMING", "false").lower() == "true"
STRICT_RATE_LIMITING = os.getenv("STRICT_RATE_LIMITING", "false").lower() == "true"
🚀 Scalability Considerations
Horizontal Scaling
Stateless Design
- No server-side sessions
- Shared state in Redis/Database
- Load balancer friendly
Container Orchestration
# Kubernetes deployment example
apiVersion: apps/v1
kind: Deployment
metadata:
name: rentcache
spec:
replicas: 3
selector:
matchLabels:
app: rentcache
template:
spec:
containers:
- name: rentcache
image: rentcache:latest
resources:
requests:
memory: "512Mi"
cpu: "250m"
limits:
memory: "1Gi"
cpu: "500m"
Vertical Scaling
Resource Optimization
- Memory: Cache size tuning
- CPU: Async I/O optimization
- Storage: Database indexing and partitioning
- Network: Connection pooling and keep-alive
Data Partitioning
Database Sharding
-- Partition by date for analytics
CREATE TABLE usage_stats (
-- columns
) PARTITION BY RANGE (created_at);
CREATE TABLE usage_stats_2024_01 PARTITION OF usage_stats
FOR VALUES FROM ('2024-01-01') TO ('2024-02-01');
Cache Distribution
- Redis Cluster for distributed caching
- Consistent hashing for cache key distribution
- Regional cache replication
🔄 Disaster Recovery
Backup Strategy
Database Backups
# Automated daily backups
pg_dump rentcache | gzip > backup-$(date +%Y%m%d).sql.gz
# Point-in-time recovery
pg_basebackup -D /backup/base -Ft -z -P
Configuration Backups
- Environment variables
- Docker Compose files
- SSL certificates
- Application configuration
Recovery Procedures
Database Recovery
# Restore from backup
gunzip -c backup-20240115.sql.gz | psql rentcache
# Point-in-time recovery
pg_ctl stop -D /var/lib/postgresql/data
rm -rf /var/lib/postgresql/data/*
pg_basebackup -D /var/lib/postgresql/data -R
Cache Recovery
- Redis persistence (RDB + AOF)
- Cache warming from database
- Graceful degradation to upstream API
This architecture is designed for high availability, performance, and cost optimization while maintaining security and operational simplicity. For implementation details, see the Deployment Guide and Usage Guide.