Add comprehensive production deployment documentation
- Add Docker Volume Drivers section with NFS, CIFS/SMB, ZFS, Block Storage configs - Add Docker Swarm multi-node deployment with HA and cloud provider examples - Include enterprise storage integrations and monitoring/scaling configurations - Complete production-ready infrastructure documentation
This commit is contained in:
parent
f704dc5975
commit
1b5b71acad
293
README.md
293
README.md
@ -228,6 +228,299 @@ go test ./tests/integration/ -v -workers=10 -jobs=50
|
|||||||
2. **Local + Sync**: Job files copied to local storage before rendering
|
2. **Local + Sync**: Job files copied to local storage before rendering
|
||||||
3. **Shaman Storage**: Content-addressable system with automatic deduplication
|
3. **Shaman Storage**: Content-addressable system with automatic deduplication
|
||||||
|
|
||||||
|
### 🗄️ Docker Volume Drivers for Production Storage
|
||||||
|
|
||||||
|
Flamenco's render farm requires shared storage accessible by all workers. Here's how to configure different Docker volume drivers for production deployments:
|
||||||
|
|
||||||
|
#### NFS (Network File System)
|
||||||
|
```yaml
|
||||||
|
# docker-compose.yml
|
||||||
|
volumes:
|
||||||
|
shared_storage:
|
||||||
|
driver: local
|
||||||
|
driver_opts:
|
||||||
|
type: nfs
|
||||||
|
o: addr=192.168.1.100,rw,nolock,hard,intr
|
||||||
|
device: ":/export/flamenco-shared"
|
||||||
|
|
||||||
|
services:
|
||||||
|
flamenco-manager:
|
||||||
|
volumes:
|
||||||
|
- shared_storage:/shared-storage
|
||||||
|
flamenco-worker:
|
||||||
|
volumes:
|
||||||
|
- shared_storage:/shared-storage
|
||||||
|
```
|
||||||
|
|
||||||
|
#### CIFS/SMB (Windows File Shares)
|
||||||
|
```yaml
|
||||||
|
# docker-compose.yml
|
||||||
|
volumes:
|
||||||
|
shared_storage:
|
||||||
|
driver: local
|
||||||
|
driver_opts:
|
||||||
|
type: cifs
|
||||||
|
o: username=flamenco,password=secure123,uid=1000,gid=1000,iocharset=utf8
|
||||||
|
device: "//192.168.1.100/flamenco-share"
|
||||||
|
|
||||||
|
services:
|
||||||
|
flamenco-manager:
|
||||||
|
volumes:
|
||||||
|
- shared_storage:/shared-storage
|
||||||
|
```
|
||||||
|
|
||||||
|
#### ZFS (High-Performance Storage)
|
||||||
|
```yaml
|
||||||
|
# docker-compose.yml
|
||||||
|
volumes:
|
||||||
|
shared_storage:
|
||||||
|
driver: local
|
||||||
|
driver_opts:
|
||||||
|
type: zfs
|
||||||
|
zfs-name: tank/flamenco-shared
|
||||||
|
|
||||||
|
services:
|
||||||
|
flamenco-manager:
|
||||||
|
volumes:
|
||||||
|
- shared_storage:/shared-storage
|
||||||
|
```
|
||||||
|
|
||||||
|
#### Block Storage (Cloud/SAN)
|
||||||
|
```yaml
|
||||||
|
# For AWS EFS, GCP Filestore, or SAN storage
|
||||||
|
volumes:
|
||||||
|
shared_storage:
|
||||||
|
driver: local
|
||||||
|
driver_opts:
|
||||||
|
type: nfs4
|
||||||
|
o: addr=fs-12345678.efs.us-west-2.amazonaws.com,rsize=1048576,wsize=1048576
|
||||||
|
device: ":/"
|
||||||
|
```
|
||||||
|
|
||||||
|
#### Third-Party Volume Plugins
|
||||||
|
```yaml
|
||||||
|
# NetApp, Pure Storage, etc.
|
||||||
|
volumes:
|
||||||
|
shared_storage:
|
||||||
|
driver: netapp:latest
|
||||||
|
driver_opts:
|
||||||
|
size: "500GB"
|
||||||
|
performance: "high"
|
||||||
|
snapshot_policy: "hourly"
|
||||||
|
```
|
||||||
|
|
||||||
|
#### Performance Considerations
|
||||||
|
- **NFS**: Best for mixed OS environments, excellent performance with NFSv4
|
||||||
|
- **CIFS/SMB**: Ideal for Windows-heavy environments, good Windows compatibility
|
||||||
|
- **ZFS**: Superior performance and data integrity, great for Linux environments
|
||||||
|
- **Block Storage**: Cloud-native scaling, excellent for multi-region deployments
|
||||||
|
- **Enterprise Storage**: Hardware-accelerated performance for high-throughput rendering
|
||||||
|
|
||||||
|
### 🐝 Docker Swarm Multi-Node Deployment
|
||||||
|
|
||||||
|
Scale Flamenco across multiple servers, datacenters, and cloud regions with Docker Swarm orchestration:
|
||||||
|
|
||||||
|
#### Swarm Initialization
|
||||||
|
```bash
|
||||||
|
# Initialize Swarm on manager node
|
||||||
|
docker swarm init --advertise-addr 192.168.1.10
|
||||||
|
|
||||||
|
# Add worker nodes (run on each worker server)
|
||||||
|
docker swarm join --token SWMTKN-1-xxx 192.168.1.10:2377
|
||||||
|
|
||||||
|
# Label nodes for specific roles
|
||||||
|
docker node update --label-add role=manager node-1
|
||||||
|
docker node update --label-add role=worker node-2
|
||||||
|
docker node update --label-add gpu=nvidia node-3
|
||||||
|
```
|
||||||
|
|
||||||
|
#### Production Swarm Stack
|
||||||
|
```yaml
|
||||||
|
# flamenco-swarm.yml
|
||||||
|
version: '3.8'
|
||||||
|
services:
|
||||||
|
flamenco-manager:
|
||||||
|
image: flamenco:production
|
||||||
|
ports:
|
||||||
|
- "8080:8080"
|
||||||
|
networks:
|
||||||
|
- flamenco-net
|
||||||
|
volumes:
|
||||||
|
- shared_storage:/shared-storage
|
||||||
|
- manager_data:/app/data
|
||||||
|
deploy:
|
||||||
|
replicas: 1
|
||||||
|
placement:
|
||||||
|
constraints:
|
||||||
|
- node.labels.role == manager
|
||||||
|
restart_policy:
|
||||||
|
condition: on-failure
|
||||||
|
max_attempts: 3
|
||||||
|
environment:
|
||||||
|
- FLAMENCO_LISTEN=:8080
|
||||||
|
- FLAMENCO_DATABASE_URL=sqlite:///app/data/flamenco.db
|
||||||
|
|
||||||
|
flamenco-worker:
|
||||||
|
image: flamenco:production
|
||||||
|
networks:
|
||||||
|
- flamenco-net
|
||||||
|
volumes:
|
||||||
|
- shared_storage:/shared-storage
|
||||||
|
- worker_data:/app/data
|
||||||
|
deploy:
|
||||||
|
mode: global # One worker per node
|
||||||
|
placement:
|
||||||
|
constraints:
|
||||||
|
- node.labels.role == worker
|
||||||
|
resources:
|
||||||
|
limits:
|
||||||
|
memory: 4G
|
||||||
|
reservations:
|
||||||
|
memory: 2G
|
||||||
|
restart_policy:
|
||||||
|
condition: on-failure
|
||||||
|
max_attempts: 3
|
||||||
|
environment:
|
||||||
|
- MANAGER_URL=http://flamenco-manager:8080
|
||||||
|
- WORKER_SLEEP_SCHEDULE=22:00-06:00
|
||||||
|
|
||||||
|
# GPU-enabled workers for CUDA/OpenCL rendering
|
||||||
|
flamenco-gpu-worker:
|
||||||
|
image: flamenco:production-gpu
|
||||||
|
networks:
|
||||||
|
- flamenco-net
|
||||||
|
volumes:
|
||||||
|
- shared_storage:/shared-storage
|
||||||
|
deploy:
|
||||||
|
replicas: 2
|
||||||
|
placement:
|
||||||
|
constraints:
|
||||||
|
- node.labels.gpu == nvidia
|
||||||
|
resources:
|
||||||
|
reservations:
|
||||||
|
generic_resources:
|
||||||
|
- discrete_resource_spec:
|
||||||
|
kind: 'NVIDIA-GPU'
|
||||||
|
value: 1
|
||||||
|
environment:
|
||||||
|
- MANAGER_URL=http://flamenco-manager:8080
|
||||||
|
- CUDA_VISIBLE_DEVICES=all
|
||||||
|
|
||||||
|
networks:
|
||||||
|
flamenco-net:
|
||||||
|
driver: overlay
|
||||||
|
attachable: true
|
||||||
|
|
||||||
|
volumes:
|
||||||
|
shared_storage:
|
||||||
|
driver: local
|
||||||
|
driver_opts:
|
||||||
|
type: nfs4
|
||||||
|
o: addr=storage.company.com,rsize=1048576,wsize=1048576
|
||||||
|
device: ":/flamenco-shared"
|
||||||
|
manager_data:
|
||||||
|
driver: local
|
||||||
|
worker_data:
|
||||||
|
driver: local
|
||||||
|
```
|
||||||
|
|
||||||
|
#### Multi-Datacenter Deployment
|
||||||
|
```bash
|
||||||
|
# Deploy the stack
|
||||||
|
docker stack deploy -c flamenco-swarm.yml flamenco
|
||||||
|
|
||||||
|
# Scale workers across regions
|
||||||
|
docker service scale flamenco_flamenco-worker=20
|
||||||
|
|
||||||
|
# Update service with zero downtime
|
||||||
|
docker service update --image flamenco:v3.8 flamenco_flamenco-manager
|
||||||
|
|
||||||
|
# Monitor cluster health
|
||||||
|
docker service ls
|
||||||
|
docker service ps flamenco_flamenco-worker --no-trunc
|
||||||
|
```
|
||||||
|
|
||||||
|
#### Advanced Swarm Configuration
|
||||||
|
```yaml
|
||||||
|
# High-availability manager setup
|
||||||
|
flamenco-manager:
|
||||||
|
deploy:
|
||||||
|
replicas: 3 # HA setup
|
||||||
|
placement:
|
||||||
|
max_replicas_per_node: 1
|
||||||
|
constraints:
|
||||||
|
- node.labels.role == manager
|
||||||
|
update_config:
|
||||||
|
parallelism: 1
|
||||||
|
delay: 30s
|
||||||
|
order: stop-first
|
||||||
|
rollback_config:
|
||||||
|
parallelism: 1
|
||||||
|
delay: 30s
|
||||||
|
|
||||||
|
# Load balancer for multi-manager setup
|
||||||
|
flamenco-lb:
|
||||||
|
image: nginx:alpine
|
||||||
|
ports:
|
||||||
|
- "80:80"
|
||||||
|
- "443:443"
|
||||||
|
deploy:
|
||||||
|
replicas: 2
|
||||||
|
placement:
|
||||||
|
constraints:
|
||||||
|
- node.labels.role == edge
|
||||||
|
```
|
||||||
|
|
||||||
|
#### Cloud Provider Integration
|
||||||
|
```yaml
|
||||||
|
# AWS ECS/Fargate integration
|
||||||
|
flamenco-worker:
|
||||||
|
deploy:
|
||||||
|
placement:
|
||||||
|
constraints:
|
||||||
|
- node.labels.cloud == aws
|
||||||
|
- node.labels.zone == us-west-2a
|
||||||
|
environment:
|
||||||
|
- AWS_REGION=us-west-2
|
||||||
|
- S3_BUCKET=company-flamenco-assets
|
||||||
|
|
||||||
|
# Multi-cloud deployment
|
||||||
|
flamenco-worker-aws:
|
||||||
|
deploy:
|
||||||
|
placement:
|
||||||
|
constraints:
|
||||||
|
- node.labels.provider == aws
|
||||||
|
|
||||||
|
flamenco-worker-gcp:
|
||||||
|
deploy:
|
||||||
|
placement:
|
||||||
|
constraints:
|
||||||
|
- node.labels.provider == gcp
|
||||||
|
```
|
||||||
|
|
||||||
|
#### Monitoring & Scaling
|
||||||
|
```bash
|
||||||
|
# Auto-scaling based on queue depth
|
||||||
|
docker service update --replicas 50 flamenco_flamenco-worker
|
||||||
|
|
||||||
|
# Health monitoring
|
||||||
|
docker service logs -f flamenco_flamenco-manager
|
||||||
|
|
||||||
|
# Resource monitoring across cluster
|
||||||
|
docker stats $(docker ps -q)
|
||||||
|
|
||||||
|
# Update shared storage across all nodes
|
||||||
|
docker service update --mount-add type=volume,src=new_storage,dst=/shared-storage flamenco_flamenco-worker
|
||||||
|
```
|
||||||
|
|
||||||
|
#### Benefits of Swarm Deployment
|
||||||
|
- **Geographic Distribution**: Workers across multiple datacenters/clouds
|
||||||
|
- **High Availability**: Manager failover and redundancy
|
||||||
|
- **Dynamic Scaling**: Auto-scale workers based on render queue
|
||||||
|
- **Zero-Downtime Updates**: Rolling updates across the fleet
|
||||||
|
- **Resource Management**: CPU/memory/GPU resource allocation
|
||||||
|
- **Service Discovery**: Automatic load balancing and networking
|
||||||
|
|
||||||
### Docker Production Deployment
|
### Docker Production Deployment
|
||||||
```yaml
|
```yaml
|
||||||
version: '3.8'
|
version: '3.8'
|
||||||
|
Loading…
x
Reference in New Issue
Block a user