1
0
forked from MCP/llm-fusion-mcp
llm-fusion-mcp/DEPLOYMENT.md
Ryan Malloy c335ba0e1e Initial commit: LLM Fusion MCP Server
- Unified access to 4 major LLM providers (Gemini, OpenAI, Anthropic, Grok)
- Real-time streaming support across all providers
- Multimodal capabilities (text, images, audio)
- Intelligent document processing with smart chunking
- Production-ready with health monitoring and error handling
- Full OpenAI ecosystem integration (Assistants, DALL-E, Whisper)
- Vector embeddings and semantic similarity
- Session-based API key management
- Built with FastMCP and modern Python tooling

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-09-05 05:47:51 -06:00

460 lines
9.8 KiB
Markdown

# 🚀 LLM Fusion MCP - Production Deployment Guide
This guide covers deploying **LLM Fusion MCP** in production environments with Docker, cloud platforms, and enterprise setups.
---
## 📋 **Quick Start**
### **1. Prerequisites**
- Docker & Docker Compose
- At least 2GB RAM
- Internet connection for AI provider APIs
- One or more LLM provider API keys
### **2. One-Command Deployment**
```bash
# Clone and deploy
git clone <repository-url>
cd llm-fusion-mcp
# Configure environment
cp .env.production .env
# Edit .env with your API keys
# Deploy with Docker
./deploy.sh production
```
---
## 🐳 **Docker Deployment**
### **Method 1: Docker Compose (Recommended)**
```bash
# Start services
docker-compose up -d
# View logs
docker-compose logs -f
# Stop services
docker-compose down
```
### **Method 2: Standalone Docker**
```bash
# Build image
docker build -t llm-fusion-mcp:latest .
# Run container
docker run -d \
--name llm-fusion-mcp \
--restart unless-stopped \
-e GOOGLE_API_KEY="your_key" \
-e OPENAI_API_KEY="your_key" \
-v ./logs:/app/logs \
llm-fusion-mcp:latest
```
### **Method 3: Pre-built Images**
```bash
# Pull from GitHub Container Registry
docker pull ghcr.io/username/llm-fusion-mcp:latest
# Run with your environment
docker run -d \
--name llm-fusion-mcp \
--env-file .env \
ghcr.io/username/llm-fusion-mcp:latest
```
---
## ☁️ **Cloud Platform Deployment**
### **🔵 AWS Deployment**
#### **AWS ECS with Fargate**
```yaml
# ecs-task-definition.json
{
"family": "llm-fusion-mcp",
"networkMode": "awsvpc",
"requiresCompatibilities": ["FARGATE"],
"cpu": "1024",
"memory": "2048",
"executionRoleArn": "arn:aws:iam::account:role/ecsTaskExecutionRole",
"containerDefinitions": [
{
"name": "llm-fusion-mcp",
"image": "ghcr.io/username/llm-fusion-mcp:latest",
"essential": true,
"logConfiguration": {
"logDriver": "awslogs",
"options": {
"awslogs-group": "/ecs/llm-fusion-mcp",
"awslogs-region": "us-east-1",
"awslogs-stream-prefix": "ecs"
}
},
"environment": [
{"name": "GOOGLE_API_KEY", "value": "your_key"},
{"name": "SERVER_MODE", "value": "production"}
]
}
]
}
```
#### **AWS Lambda (Serverless)**
```bash
# Package for Lambda
zip -r llm-fusion-mcp-lambda.zip src/ requirements.txt
# Deploy with AWS CLI
aws lambda create-function \
--function-name llm-fusion-mcp \
--runtime python3.12 \
--role arn:aws:iam::account:role/lambda-execution-role \
--handler src.llm_fusion_mcp.lambda_handler \
--zip-file fileb://llm-fusion-mcp-lambda.zip \
--timeout 300 \
--memory-size 1024
```
### **🔷 Azure Deployment**
#### **Azure Container Instances**
```bash
# Deploy to Azure
az container create \
--resource-group myResourceGroup \
--name llm-fusion-mcp \
--image ghcr.io/username/llm-fusion-mcp:latest \
--cpu 2 --memory 4 \
--restart-policy Always \
--environment-variables \
GOOGLE_API_KEY="your_key" \
SERVER_MODE="production"
```
#### **Azure App Service**
```bash
# Deploy as Web App
az webapp create \
--resource-group myResourceGroup \
--plan myAppServicePlan \
--name llm-fusion-mcp \
--deployment-container-image-name ghcr.io/username/llm-fusion-mcp:latest
# Configure environment
az webapp config appsettings set \
--resource-group myResourceGroup \
--name llm-fusion-mcp \
--settings \
GOOGLE_API_KEY="your_key" \
SERVER_MODE="production"
```
### **🟢 Google Cloud Deployment**
#### **Cloud Run**
```bash
# Deploy to Cloud Run
gcloud run deploy llm-fusion-mcp \
--image ghcr.io/username/llm-fusion-mcp:latest \
--platform managed \
--region us-central1 \
--allow-unauthenticated \
--set-env-vars GOOGLE_API_KEY="your_key",SERVER_MODE="production" \
--memory 2Gi \
--cpu 2
```
#### **GKE (Kubernetes)**
```yaml
# kubernetes-deployment.yml
apiVersion: apps/v1
kind: Deployment
metadata:
name: llm-fusion-mcp
spec:
replicas: 3
selector:
matchLabels:
app: llm-fusion-mcp
template:
metadata:
labels:
app: llm-fusion-mcp
spec:
containers:
- name: llm-fusion-mcp
image: ghcr.io/username/llm-fusion-mcp:latest
ports:
- containerPort: 8000
env:
- name: GOOGLE_API_KEY
valueFrom:
secretKeyRef:
name: llm-fusion-secrets
key: google-api-key
resources:
requests:
memory: "1Gi"
cpu: "500m"
limits:
memory: "2Gi"
cpu: "1000m"
---
apiVersion: v1
kind: Service
metadata:
name: llm-fusion-mcp-service
spec:
selector:
app: llm-fusion-mcp
ports:
- protocol: TCP
port: 80
targetPort: 8000
type: LoadBalancer
```
---
## 🏢 **Enterprise Deployment**
### **🔐 Security Hardening**
#### **1. API Key Security**
```bash
# Use encrypted secrets
kubectl create secret generic llm-fusion-secrets \
--from-literal=google-api-key="$GOOGLE_API_KEY" \
--from-literal=openai-api-key="$OPENAI_API_KEY"
# Enable key rotation
export ENABLE_KEY_ROTATION=true
export KEY_ROTATION_INTERVAL=24
```
#### **2. Network Security**
```bash
# Firewall rules (example for AWS)
aws ec2 create-security-group \
--group-name llm-fusion-mcp-sg \
--description "LLM Fusion MCP Security Group"
# Allow only necessary ports
aws ec2 authorize-security-group-ingress \
--group-id sg-xxxxxxx \
--protocol tcp \
--port 8000 \
--source-group sg-frontend
```
#### **3. Resource Limits**
```yaml
# Docker Compose with limits
version: '3.8'
services:
llm-fusion-mcp:
image: llm-fusion-mcp:latest
deploy:
resources:
limits:
cpus: '2.0'
memory: 4G
reservations:
cpus: '1.0'
memory: 2G
restart: unless-stopped
```
### **📊 Monitoring & Observability**
#### **1. Health Checks**
```bash
# Built-in health endpoint
curl http://localhost:8000/health
# Docker health check
docker run --health-cmd="curl -f http://localhost:8000/health" \
--health-interval=30s \
--health-retries=3 \
--health-start-period=40s \
--health-timeout=10s \
llm-fusion-mcp:latest
```
#### **2. Prometheus Metrics**
```yaml
# prometheus.yml
scrape_configs:
- job_name: 'llm-fusion-mcp'
static_configs:
- targets: ['llm-fusion-mcp:9090']
metrics_path: /metrics
scrape_interval: 15s
```
#### **3. Centralized Logging**
```bash
# ELK Stack integration
docker run -d \
--name llm-fusion-mcp \
--log-driver=fluentd \
--log-opt fluentd-address=localhost:24224 \
--log-opt tag="docker.llm-fusion-mcp" \
llm-fusion-mcp:latest
```
### **🔄 High Availability Setup**
#### **1. Load Balancing**
```nginx
# nginx.conf
upstream llm_fusion_backend {
server llm-fusion-mcp-1:8000;
server llm-fusion-mcp-2:8000;
server llm-fusion-mcp-3:8000;
}
server {
listen 80;
location / {
proxy_pass http://llm_fusion_backend;
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
}
}
```
#### **2. Auto-scaling**
```yaml
# Kubernetes HPA
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: llm-fusion-mcp-hpa
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: llm-fusion-mcp
minReplicas: 3
maxReplicas: 10
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 70
```
---
## 🔧 **Configuration Management**
### **Environment Variables**
| Variable | Required | Default | Description |
|----------|----------|---------|-------------|
| `GOOGLE_API_KEY` | ✅ | - | Google Gemini API key |
| `OPENAI_API_KEY` | ❌ | - | OpenAI API key |
| `ANTHROPIC_API_KEY` | ❌ | - | Anthropic API key |
| `XAI_API_KEY` | ❌ | - | xAI Grok API key |
| `SERVER_MODE` | ❌ | `production` | Server mode |
| `LOG_LEVEL` | ❌ | `INFO` | Logging level |
| `MAX_FILE_SIZE_MB` | ❌ | `50` | Max file size for analysis |
| `REQUEST_TIMEOUT` | ❌ | `300` | Request timeout in seconds |
### **Volume Mounts**
```bash
# Data persistence
-v ./data:/app/data # Persistent data
-v ./logs:/app/logs # Log files
-v ./config:/app/config # Configuration files
-v ./cache:/app/cache # Model cache
```
---
## 🚨 **Troubleshooting**
### **Common Issues**
#### **Container Won't Start**
```bash
# Check logs
docker-compose logs llm-fusion-mcp
# Common fixes
# 1. API key not configured
# 2. Port already in use
# 3. Insufficient memory
# Debug mode
docker-compose run --rm llm-fusion-mcp bash
```
#### **API Connection Issues**
```bash
# Test API connectivity
curl -H "Authorization: Bearer $GOOGLE_API_KEY" \
https://generativelanguage.googleapis.com/v1beta/models
# Check firewall/network
telnet api.openai.com 443
```
#### **Performance Issues**
```bash
# Monitor resource usage
docker stats llm-fusion-mcp
# Scale horizontally
docker-compose up --scale llm-fusion-mcp=3
```
### **Health Checks**
```bash
# Built-in health check
curl http://localhost:8000/health
# Provider status
curl http://localhost:8000/health/providers
# System metrics
curl http://localhost:8000/metrics
```
---
## 📞 **Support**
### **Getting Help**
- 📖 **Documentation**: Check README.md and INTEGRATION.md
- 🧪 **Testing**: Run health checks and test suite
- 🔍 **Debugging**: Enable DEBUG log level
- 📊 **Monitoring**: Check metrics and logs
### **Performance Tuning**
- **Memory**: Increase container memory for large file processing
- **CPU**: Scale horizontally for high throughput
- **Cache**: Tune model cache timeout for your usage patterns
- **Network**: Use CDN for static assets, optimize API endpoints
---
<div align="center">
## 🎉 **Ready for Production!**
**Your LLM Fusion MCP server is now deployed and ready to handle production workloads!**
*Built with ❤️ for enterprise-grade AI integration*
</div>