forked from MCP/llm-fusion-mcp
- Unified access to 4 major LLM providers (Gemini, OpenAI, Anthropic, Grok) - Real-time streaming support across all providers - Multimodal capabilities (text, images, audio) - Intelligent document processing with smart chunking - Production-ready with health monitoring and error handling - Full OpenAI ecosystem integration (Assistants, DALL-E, Whisper) - Vector embeddings and semantic similarity - Session-based API key management - Built with FastMCP and modern Python tooling 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com>
460 lines
9.8 KiB
Markdown
460 lines
9.8 KiB
Markdown
# 🚀 LLM Fusion MCP - Production Deployment Guide
|
|
|
|
This guide covers deploying **LLM Fusion MCP** in production environments with Docker, cloud platforms, and enterprise setups.
|
|
|
|
---
|
|
|
|
## 📋 **Quick Start**
|
|
|
|
### **1. Prerequisites**
|
|
- Docker & Docker Compose
|
|
- At least 2GB RAM
|
|
- Internet connection for AI provider APIs
|
|
- One or more LLM provider API keys
|
|
|
|
### **2. One-Command Deployment**
|
|
```bash
|
|
# Clone and deploy
|
|
git clone <repository-url>
|
|
cd llm-fusion-mcp
|
|
|
|
# Configure environment
|
|
cp .env.production .env
|
|
# Edit .env with your API keys
|
|
|
|
# Deploy with Docker
|
|
./deploy.sh production
|
|
```
|
|
|
|
---
|
|
|
|
## 🐳 **Docker Deployment**
|
|
|
|
### **Method 1: Docker Compose (Recommended)**
|
|
```bash
|
|
# Start services
|
|
docker-compose up -d
|
|
|
|
# View logs
|
|
docker-compose logs -f
|
|
|
|
# Stop services
|
|
docker-compose down
|
|
```
|
|
|
|
### **Method 2: Standalone Docker**
|
|
```bash
|
|
# Build image
|
|
docker build -t llm-fusion-mcp:latest .
|
|
|
|
# Run container
|
|
docker run -d \
|
|
--name llm-fusion-mcp \
|
|
--restart unless-stopped \
|
|
-e GOOGLE_API_KEY="your_key" \
|
|
-e OPENAI_API_KEY="your_key" \
|
|
-v ./logs:/app/logs \
|
|
llm-fusion-mcp:latest
|
|
```
|
|
|
|
### **Method 3: Pre-built Images**
|
|
```bash
|
|
# Pull from GitHub Container Registry
|
|
docker pull ghcr.io/username/llm-fusion-mcp:latest
|
|
|
|
# Run with your environment
|
|
docker run -d \
|
|
--name llm-fusion-mcp \
|
|
--env-file .env \
|
|
ghcr.io/username/llm-fusion-mcp:latest
|
|
```
|
|
|
|
---
|
|
|
|
## ☁️ **Cloud Platform Deployment**
|
|
|
|
### **🔵 AWS Deployment**
|
|
|
|
#### **AWS ECS with Fargate**
|
|
```yaml
|
|
# ecs-task-definition.json
|
|
{
|
|
"family": "llm-fusion-mcp",
|
|
"networkMode": "awsvpc",
|
|
"requiresCompatibilities": ["FARGATE"],
|
|
"cpu": "1024",
|
|
"memory": "2048",
|
|
"executionRoleArn": "arn:aws:iam::account:role/ecsTaskExecutionRole",
|
|
"containerDefinitions": [
|
|
{
|
|
"name": "llm-fusion-mcp",
|
|
"image": "ghcr.io/username/llm-fusion-mcp:latest",
|
|
"essential": true,
|
|
"logConfiguration": {
|
|
"logDriver": "awslogs",
|
|
"options": {
|
|
"awslogs-group": "/ecs/llm-fusion-mcp",
|
|
"awslogs-region": "us-east-1",
|
|
"awslogs-stream-prefix": "ecs"
|
|
}
|
|
},
|
|
"environment": [
|
|
{"name": "GOOGLE_API_KEY", "value": "your_key"},
|
|
{"name": "SERVER_MODE", "value": "production"}
|
|
]
|
|
}
|
|
]
|
|
}
|
|
```
|
|
|
|
#### **AWS Lambda (Serverless)**
|
|
```bash
|
|
# Package for Lambda
|
|
zip -r llm-fusion-mcp-lambda.zip src/ requirements.txt
|
|
|
|
# Deploy with AWS CLI
|
|
aws lambda create-function \
|
|
--function-name llm-fusion-mcp \
|
|
--runtime python3.12 \
|
|
--role arn:aws:iam::account:role/lambda-execution-role \
|
|
--handler src.llm_fusion_mcp.lambda_handler \
|
|
--zip-file fileb://llm-fusion-mcp-lambda.zip \
|
|
--timeout 300 \
|
|
--memory-size 1024
|
|
```
|
|
|
|
### **🔷 Azure Deployment**
|
|
|
|
#### **Azure Container Instances**
|
|
```bash
|
|
# Deploy to Azure
|
|
az container create \
|
|
--resource-group myResourceGroup \
|
|
--name llm-fusion-mcp \
|
|
--image ghcr.io/username/llm-fusion-mcp:latest \
|
|
--cpu 2 --memory 4 \
|
|
--restart-policy Always \
|
|
--environment-variables \
|
|
GOOGLE_API_KEY="your_key" \
|
|
SERVER_MODE="production"
|
|
```
|
|
|
|
#### **Azure App Service**
|
|
```bash
|
|
# Deploy as Web App
|
|
az webapp create \
|
|
--resource-group myResourceGroup \
|
|
--plan myAppServicePlan \
|
|
--name llm-fusion-mcp \
|
|
--deployment-container-image-name ghcr.io/username/llm-fusion-mcp:latest
|
|
|
|
# Configure environment
|
|
az webapp config appsettings set \
|
|
--resource-group myResourceGroup \
|
|
--name llm-fusion-mcp \
|
|
--settings \
|
|
GOOGLE_API_KEY="your_key" \
|
|
SERVER_MODE="production"
|
|
```
|
|
|
|
### **🟢 Google Cloud Deployment**
|
|
|
|
#### **Cloud Run**
|
|
```bash
|
|
# Deploy to Cloud Run
|
|
gcloud run deploy llm-fusion-mcp \
|
|
--image ghcr.io/username/llm-fusion-mcp:latest \
|
|
--platform managed \
|
|
--region us-central1 \
|
|
--allow-unauthenticated \
|
|
--set-env-vars GOOGLE_API_KEY="your_key",SERVER_MODE="production" \
|
|
--memory 2Gi \
|
|
--cpu 2
|
|
```
|
|
|
|
#### **GKE (Kubernetes)**
|
|
```yaml
|
|
# kubernetes-deployment.yml
|
|
apiVersion: apps/v1
|
|
kind: Deployment
|
|
metadata:
|
|
name: llm-fusion-mcp
|
|
spec:
|
|
replicas: 3
|
|
selector:
|
|
matchLabels:
|
|
app: llm-fusion-mcp
|
|
template:
|
|
metadata:
|
|
labels:
|
|
app: llm-fusion-mcp
|
|
spec:
|
|
containers:
|
|
- name: llm-fusion-mcp
|
|
image: ghcr.io/username/llm-fusion-mcp:latest
|
|
ports:
|
|
- containerPort: 8000
|
|
env:
|
|
- name: GOOGLE_API_KEY
|
|
valueFrom:
|
|
secretKeyRef:
|
|
name: llm-fusion-secrets
|
|
key: google-api-key
|
|
resources:
|
|
requests:
|
|
memory: "1Gi"
|
|
cpu: "500m"
|
|
limits:
|
|
memory: "2Gi"
|
|
cpu: "1000m"
|
|
---
|
|
apiVersion: v1
|
|
kind: Service
|
|
metadata:
|
|
name: llm-fusion-mcp-service
|
|
spec:
|
|
selector:
|
|
app: llm-fusion-mcp
|
|
ports:
|
|
- protocol: TCP
|
|
port: 80
|
|
targetPort: 8000
|
|
type: LoadBalancer
|
|
```
|
|
|
|
---
|
|
|
|
## 🏢 **Enterprise Deployment**
|
|
|
|
### **🔐 Security Hardening**
|
|
|
|
#### **1. API Key Security**
|
|
```bash
|
|
# Use encrypted secrets
|
|
kubectl create secret generic llm-fusion-secrets \
|
|
--from-literal=google-api-key="$GOOGLE_API_KEY" \
|
|
--from-literal=openai-api-key="$OPENAI_API_KEY"
|
|
|
|
# Enable key rotation
|
|
export ENABLE_KEY_ROTATION=true
|
|
export KEY_ROTATION_INTERVAL=24
|
|
```
|
|
|
|
#### **2. Network Security**
|
|
```bash
|
|
# Firewall rules (example for AWS)
|
|
aws ec2 create-security-group \
|
|
--group-name llm-fusion-mcp-sg \
|
|
--description "LLM Fusion MCP Security Group"
|
|
|
|
# Allow only necessary ports
|
|
aws ec2 authorize-security-group-ingress \
|
|
--group-id sg-xxxxxxx \
|
|
--protocol tcp \
|
|
--port 8000 \
|
|
--source-group sg-frontend
|
|
```
|
|
|
|
#### **3. Resource Limits**
|
|
```yaml
|
|
# Docker Compose with limits
|
|
version: '3.8'
|
|
services:
|
|
llm-fusion-mcp:
|
|
image: llm-fusion-mcp:latest
|
|
deploy:
|
|
resources:
|
|
limits:
|
|
cpus: '2.0'
|
|
memory: 4G
|
|
reservations:
|
|
cpus: '1.0'
|
|
memory: 2G
|
|
restart: unless-stopped
|
|
```
|
|
|
|
### **📊 Monitoring & Observability**
|
|
|
|
#### **1. Health Checks**
|
|
```bash
|
|
# Built-in health endpoint
|
|
curl http://localhost:8000/health
|
|
|
|
# Docker health check
|
|
docker run --health-cmd="curl -f http://localhost:8000/health" \
|
|
--health-interval=30s \
|
|
--health-retries=3 \
|
|
--health-start-period=40s \
|
|
--health-timeout=10s \
|
|
llm-fusion-mcp:latest
|
|
```
|
|
|
|
#### **2. Prometheus Metrics**
|
|
```yaml
|
|
# prometheus.yml
|
|
scrape_configs:
|
|
- job_name: 'llm-fusion-mcp'
|
|
static_configs:
|
|
- targets: ['llm-fusion-mcp:9090']
|
|
metrics_path: /metrics
|
|
scrape_interval: 15s
|
|
```
|
|
|
|
#### **3. Centralized Logging**
|
|
```bash
|
|
# ELK Stack integration
|
|
docker run -d \
|
|
--name llm-fusion-mcp \
|
|
--log-driver=fluentd \
|
|
--log-opt fluentd-address=localhost:24224 \
|
|
--log-opt tag="docker.llm-fusion-mcp" \
|
|
llm-fusion-mcp:latest
|
|
```
|
|
|
|
### **🔄 High Availability Setup**
|
|
|
|
#### **1. Load Balancing**
|
|
```nginx
|
|
# nginx.conf
|
|
upstream llm_fusion_backend {
|
|
server llm-fusion-mcp-1:8000;
|
|
server llm-fusion-mcp-2:8000;
|
|
server llm-fusion-mcp-3:8000;
|
|
}
|
|
|
|
server {
|
|
listen 80;
|
|
location / {
|
|
proxy_pass http://llm_fusion_backend;
|
|
proxy_set_header Host $host;
|
|
proxy_set_header X-Real-IP $remote_addr;
|
|
}
|
|
}
|
|
```
|
|
|
|
#### **2. Auto-scaling**
|
|
```yaml
|
|
# Kubernetes HPA
|
|
apiVersion: autoscaling/v2
|
|
kind: HorizontalPodAutoscaler
|
|
metadata:
|
|
name: llm-fusion-mcp-hpa
|
|
spec:
|
|
scaleTargetRef:
|
|
apiVersion: apps/v1
|
|
kind: Deployment
|
|
name: llm-fusion-mcp
|
|
minReplicas: 3
|
|
maxReplicas: 10
|
|
metrics:
|
|
- type: Resource
|
|
resource:
|
|
name: cpu
|
|
target:
|
|
type: Utilization
|
|
averageUtilization: 70
|
|
```
|
|
|
|
---
|
|
|
|
## 🔧 **Configuration Management**
|
|
|
|
### **Environment Variables**
|
|
| Variable | Required | Default | Description |
|
|
|----------|----------|---------|-------------|
|
|
| `GOOGLE_API_KEY` | ✅ | - | Google Gemini API key |
|
|
| `OPENAI_API_KEY` | ❌ | - | OpenAI API key |
|
|
| `ANTHROPIC_API_KEY` | ❌ | - | Anthropic API key |
|
|
| `XAI_API_KEY` | ❌ | - | xAI Grok API key |
|
|
| `SERVER_MODE` | ❌ | `production` | Server mode |
|
|
| `LOG_LEVEL` | ❌ | `INFO` | Logging level |
|
|
| `MAX_FILE_SIZE_MB` | ❌ | `50` | Max file size for analysis |
|
|
| `REQUEST_TIMEOUT` | ❌ | `300` | Request timeout in seconds |
|
|
|
|
### **Volume Mounts**
|
|
```bash
|
|
# Data persistence
|
|
-v ./data:/app/data # Persistent data
|
|
-v ./logs:/app/logs # Log files
|
|
-v ./config:/app/config # Configuration files
|
|
-v ./cache:/app/cache # Model cache
|
|
```
|
|
|
|
---
|
|
|
|
## 🚨 **Troubleshooting**
|
|
|
|
### **Common Issues**
|
|
|
|
#### **Container Won't Start**
|
|
```bash
|
|
# Check logs
|
|
docker-compose logs llm-fusion-mcp
|
|
|
|
# Common fixes
|
|
# 1. API key not configured
|
|
# 2. Port already in use
|
|
# 3. Insufficient memory
|
|
|
|
# Debug mode
|
|
docker-compose run --rm llm-fusion-mcp bash
|
|
```
|
|
|
|
#### **API Connection Issues**
|
|
```bash
|
|
# Test API connectivity
|
|
curl -H "Authorization: Bearer $GOOGLE_API_KEY" \
|
|
https://generativelanguage.googleapis.com/v1beta/models
|
|
|
|
# Check firewall/network
|
|
telnet api.openai.com 443
|
|
```
|
|
|
|
#### **Performance Issues**
|
|
```bash
|
|
# Monitor resource usage
|
|
docker stats llm-fusion-mcp
|
|
|
|
# Scale horizontally
|
|
docker-compose up --scale llm-fusion-mcp=3
|
|
```
|
|
|
|
### **Health Checks**
|
|
```bash
|
|
# Built-in health check
|
|
curl http://localhost:8000/health
|
|
|
|
# Provider status
|
|
curl http://localhost:8000/health/providers
|
|
|
|
# System metrics
|
|
curl http://localhost:8000/metrics
|
|
```
|
|
|
|
---
|
|
|
|
## 📞 **Support**
|
|
|
|
### **Getting Help**
|
|
- 📖 **Documentation**: Check README.md and INTEGRATION.md
|
|
- 🧪 **Testing**: Run health checks and test suite
|
|
- 🔍 **Debugging**: Enable DEBUG log level
|
|
- 📊 **Monitoring**: Check metrics and logs
|
|
|
|
### **Performance Tuning**
|
|
- **Memory**: Increase container memory for large file processing
|
|
- **CPU**: Scale horizontally for high throughput
|
|
- **Cache**: Tune model cache timeout for your usage patterns
|
|
- **Network**: Use CDN for static assets, optimize API endpoints
|
|
|
|
---
|
|
|
|
<div align="center">
|
|
|
|
## 🎉 **Ready for Production!**
|
|
|
|
**Your LLM Fusion MCP server is now deployed and ready to handle production workloads!**
|
|
|
|
*Built with ❤️ for enterprise-grade AI integration*
|
|
|
|
</div> |