Some checks are pending
🚀 LLM Fusion MCP - CI/CD Pipeline / 🔍 Code Quality & Testing (3.10) (push) Waiting to run
🚀 LLM Fusion MCP - CI/CD Pipeline / 🔍 Code Quality & Testing (3.11) (push) Waiting to run
🚀 LLM Fusion MCP - CI/CD Pipeline / 🔍 Code Quality & Testing (3.12) (push) Waiting to run
🚀 LLM Fusion MCP - CI/CD Pipeline / 🛡️ Security Scanning (push) Blocked by required conditions
🚀 LLM Fusion MCP - CI/CD Pipeline / 🐳 Docker Build & Push (push) Blocked by required conditions
🚀 LLM Fusion MCP - CI/CD Pipeline / 🎉 Create Release (push) Blocked by required conditions
🚀 LLM Fusion MCP - CI/CD Pipeline / 📢 Deployment Notification (push) Blocked by required conditions
- Unified access to 4 major LLM providers (Gemini, OpenAI, Anthropic, Grok) - Real-time streaming support across all providers - Multimodal capabilities (text, images, audio) - Intelligent document processing with smart chunking - Production-ready with health monitoring and error handling - Full OpenAI ecosystem integration (Assistants, DALL-E, Whisper) - Vector embeddings and semantic similarity - Session-based API key management - Built with FastMCP and modern Python tooling 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com>
9.8 KiB
9.8 KiB
🚀 LLM Fusion MCP - Production Deployment Guide
This guide covers deploying LLM Fusion MCP in production environments with Docker, cloud platforms, and enterprise setups.
📋 Quick Start
1. Prerequisites
- Docker & Docker Compose
- At least 2GB RAM
- Internet connection for AI provider APIs
- One or more LLM provider API keys
2. One-Command Deployment
# Clone and deploy
git clone <repository-url>
cd llm-fusion-mcp
# Configure environment
cp .env.production .env
# Edit .env with your API keys
# Deploy with Docker
./deploy.sh production
🐳 Docker Deployment
Method 1: Docker Compose (Recommended)
# Start services
docker-compose up -d
# View logs
docker-compose logs -f
# Stop services
docker-compose down
Method 2: Standalone Docker
# Build image
docker build -t llm-fusion-mcp:latest .
# Run container
docker run -d \
--name llm-fusion-mcp \
--restart unless-stopped \
-e GOOGLE_API_KEY="your_key" \
-e OPENAI_API_KEY="your_key" \
-v ./logs:/app/logs \
llm-fusion-mcp:latest
Method 3: Pre-built Images
# Pull from GitHub Container Registry
docker pull ghcr.io/username/llm-fusion-mcp:latest
# Run with your environment
docker run -d \
--name llm-fusion-mcp \
--env-file .env \
ghcr.io/username/llm-fusion-mcp:latest
☁️ Cloud Platform Deployment
🔵 AWS Deployment
AWS ECS with Fargate
# ecs-task-definition.json
{
"family": "llm-fusion-mcp",
"networkMode": "awsvpc",
"requiresCompatibilities": ["FARGATE"],
"cpu": "1024",
"memory": "2048",
"executionRoleArn": "arn:aws:iam::account:role/ecsTaskExecutionRole",
"containerDefinitions": [
{
"name": "llm-fusion-mcp",
"image": "ghcr.io/username/llm-fusion-mcp:latest",
"essential": true,
"logConfiguration": {
"logDriver": "awslogs",
"options": {
"awslogs-group": "/ecs/llm-fusion-mcp",
"awslogs-region": "us-east-1",
"awslogs-stream-prefix": "ecs"
}
},
"environment": [
{"name": "GOOGLE_API_KEY", "value": "your_key"},
{"name": "SERVER_MODE", "value": "production"}
]
}
]
}
AWS Lambda (Serverless)
# Package for Lambda
zip -r llm-fusion-mcp-lambda.zip src/ requirements.txt
# Deploy with AWS CLI
aws lambda create-function \
--function-name llm-fusion-mcp \
--runtime python3.12 \
--role arn:aws:iam::account:role/lambda-execution-role \
--handler src.llm_fusion_mcp.lambda_handler \
--zip-file fileb://llm-fusion-mcp-lambda.zip \
--timeout 300 \
--memory-size 1024
🔷 Azure Deployment
Azure Container Instances
# Deploy to Azure
az container create \
--resource-group myResourceGroup \
--name llm-fusion-mcp \
--image ghcr.io/username/llm-fusion-mcp:latest \
--cpu 2 --memory 4 \
--restart-policy Always \
--environment-variables \
GOOGLE_API_KEY="your_key" \
SERVER_MODE="production"
Azure App Service
# Deploy as Web App
az webapp create \
--resource-group myResourceGroup \
--plan myAppServicePlan \
--name llm-fusion-mcp \
--deployment-container-image-name ghcr.io/username/llm-fusion-mcp:latest
# Configure environment
az webapp config appsettings set \
--resource-group myResourceGroup \
--name llm-fusion-mcp \
--settings \
GOOGLE_API_KEY="your_key" \
SERVER_MODE="production"
🟢 Google Cloud Deployment
Cloud Run
# Deploy to Cloud Run
gcloud run deploy llm-fusion-mcp \
--image ghcr.io/username/llm-fusion-mcp:latest \
--platform managed \
--region us-central1 \
--allow-unauthenticated \
--set-env-vars GOOGLE_API_KEY="your_key",SERVER_MODE="production" \
--memory 2Gi \
--cpu 2
GKE (Kubernetes)
# kubernetes-deployment.yml
apiVersion: apps/v1
kind: Deployment
metadata:
name: llm-fusion-mcp
spec:
replicas: 3
selector:
matchLabels:
app: llm-fusion-mcp
template:
metadata:
labels:
app: llm-fusion-mcp
spec:
containers:
- name: llm-fusion-mcp
image: ghcr.io/username/llm-fusion-mcp:latest
ports:
- containerPort: 8000
env:
- name: GOOGLE_API_KEY
valueFrom:
secretKeyRef:
name: llm-fusion-secrets
key: google-api-key
resources:
requests:
memory: "1Gi"
cpu: "500m"
limits:
memory: "2Gi"
cpu: "1000m"
---
apiVersion: v1
kind: Service
metadata:
name: llm-fusion-mcp-service
spec:
selector:
app: llm-fusion-mcp
ports:
- protocol: TCP
port: 80
targetPort: 8000
type: LoadBalancer
🏢 Enterprise Deployment
🔐 Security Hardening
1. API Key Security
# Use encrypted secrets
kubectl create secret generic llm-fusion-secrets \
--from-literal=google-api-key="$GOOGLE_API_KEY" \
--from-literal=openai-api-key="$OPENAI_API_KEY"
# Enable key rotation
export ENABLE_KEY_ROTATION=true
export KEY_ROTATION_INTERVAL=24
2. Network Security
# Firewall rules (example for AWS)
aws ec2 create-security-group \
--group-name llm-fusion-mcp-sg \
--description "LLM Fusion MCP Security Group"
# Allow only necessary ports
aws ec2 authorize-security-group-ingress \
--group-id sg-xxxxxxx \
--protocol tcp \
--port 8000 \
--source-group sg-frontend
3. Resource Limits
# Docker Compose with limits
version: '3.8'
services:
llm-fusion-mcp:
image: llm-fusion-mcp:latest
deploy:
resources:
limits:
cpus: '2.0'
memory: 4G
reservations:
cpus: '1.0'
memory: 2G
restart: unless-stopped
📊 Monitoring & Observability
1. Health Checks
# Built-in health endpoint
curl http://localhost:8000/health
# Docker health check
docker run --health-cmd="curl -f http://localhost:8000/health" \
--health-interval=30s \
--health-retries=3 \
--health-start-period=40s \
--health-timeout=10s \
llm-fusion-mcp:latest
2. Prometheus Metrics
# prometheus.yml
scrape_configs:
- job_name: 'llm-fusion-mcp'
static_configs:
- targets: ['llm-fusion-mcp:9090']
metrics_path: /metrics
scrape_interval: 15s
3. Centralized Logging
# ELK Stack integration
docker run -d \
--name llm-fusion-mcp \
--log-driver=fluentd \
--log-opt fluentd-address=localhost:24224 \
--log-opt tag="docker.llm-fusion-mcp" \
llm-fusion-mcp:latest
🔄 High Availability Setup
1. Load Balancing
# nginx.conf
upstream llm_fusion_backend {
server llm-fusion-mcp-1:8000;
server llm-fusion-mcp-2:8000;
server llm-fusion-mcp-3:8000;
}
server {
listen 80;
location / {
proxy_pass http://llm_fusion_backend;
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
}
}
2. Auto-scaling
# Kubernetes HPA
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: llm-fusion-mcp-hpa
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: llm-fusion-mcp
minReplicas: 3
maxReplicas: 10
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 70
🔧 Configuration Management
Environment Variables
Variable | Required | Default | Description |
---|---|---|---|
GOOGLE_API_KEY |
✅ | - | Google Gemini API key |
OPENAI_API_KEY |
❌ | - | OpenAI API key |
ANTHROPIC_API_KEY |
❌ | - | Anthropic API key |
XAI_API_KEY |
❌ | - | xAI Grok API key |
SERVER_MODE |
❌ | production |
Server mode |
LOG_LEVEL |
❌ | INFO |
Logging level |
MAX_FILE_SIZE_MB |
❌ | 50 |
Max file size for analysis |
REQUEST_TIMEOUT |
❌ | 300 |
Request timeout in seconds |
Volume Mounts
# Data persistence
-v ./data:/app/data # Persistent data
-v ./logs:/app/logs # Log files
-v ./config:/app/config # Configuration files
-v ./cache:/app/cache # Model cache
🚨 Troubleshooting
Common Issues
Container Won't Start
# Check logs
docker-compose logs llm-fusion-mcp
# Common fixes
# 1. API key not configured
# 2. Port already in use
# 3. Insufficient memory
# Debug mode
docker-compose run --rm llm-fusion-mcp bash
API Connection Issues
# Test API connectivity
curl -H "Authorization: Bearer $GOOGLE_API_KEY" \
https://generativelanguage.googleapis.com/v1beta/models
# Check firewall/network
telnet api.openai.com 443
Performance Issues
# Monitor resource usage
docker stats llm-fusion-mcp
# Scale horizontally
docker-compose up --scale llm-fusion-mcp=3
Health Checks
# Built-in health check
curl http://localhost:8000/health
# Provider status
curl http://localhost:8000/health/providers
# System metrics
curl http://localhost:8000/metrics
📞 Support
Getting Help
- 📖 Documentation: Check README.md and INTEGRATION.md
- 🧪 Testing: Run health checks and test suite
- 🔍 Debugging: Enable DEBUG log level
- 📊 Monitoring: Check metrics and logs
Performance Tuning
- Memory: Increase container memory for large file processing
- CPU: Scale horizontally for high throughput
- Cache: Tune model cache timeout for your usage patterns
- Network: Use CDN for static assets, optimize API endpoints
🎉 Ready for Production!
Your LLM Fusion MCP server is now deployed and ready to handle production workloads!
Built with ❤️ for enterprise-grade AI integration