1
0
forked from MCP/llm-fusion-mcp
llm-fusion-mcp/DEPLOYMENT.md
Ryan Malloy c335ba0e1e Initial commit: LLM Fusion MCP Server
- Unified access to 4 major LLM providers (Gemini, OpenAI, Anthropic, Grok)
- Real-time streaming support across all providers
- Multimodal capabilities (text, images, audio)
- Intelligent document processing with smart chunking
- Production-ready with health monitoring and error handling
- Full OpenAI ecosystem integration (Assistants, DALL-E, Whisper)
- Vector embeddings and semantic similarity
- Session-based API key management
- Built with FastMCP and modern Python tooling

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-09-05 05:47:51 -06:00

9.8 KiB

🚀 LLM Fusion MCP - Production Deployment Guide

This guide covers deploying LLM Fusion MCP in production environments with Docker, cloud platforms, and enterprise setups.


📋 Quick Start

1. Prerequisites

  • Docker & Docker Compose
  • At least 2GB RAM
  • Internet connection for AI provider APIs
  • One or more LLM provider API keys

2. One-Command Deployment

# Clone and deploy
git clone <repository-url>
cd llm-fusion-mcp

# Configure environment
cp .env.production .env
# Edit .env with your API keys

# Deploy with Docker
./deploy.sh production

🐳 Docker Deployment

# Start services
docker-compose up -d

# View logs
docker-compose logs -f

# Stop services
docker-compose down

Method 2: Standalone Docker

# Build image
docker build -t llm-fusion-mcp:latest .

# Run container
docker run -d \
  --name llm-fusion-mcp \
  --restart unless-stopped \
  -e GOOGLE_API_KEY="your_key" \
  -e OPENAI_API_KEY="your_key" \
  -v ./logs:/app/logs \
  llm-fusion-mcp:latest

Method 3: Pre-built Images

# Pull from GitHub Container Registry
docker pull ghcr.io/username/llm-fusion-mcp:latest

# Run with your environment
docker run -d \
  --name llm-fusion-mcp \
  --env-file .env \
  ghcr.io/username/llm-fusion-mcp:latest

☁️ Cloud Platform Deployment

🔵 AWS Deployment

AWS ECS with Fargate

# ecs-task-definition.json
{
  "family": "llm-fusion-mcp",
  "networkMode": "awsvpc",
  "requiresCompatibilities": ["FARGATE"],
  "cpu": "1024",
  "memory": "2048",
  "executionRoleArn": "arn:aws:iam::account:role/ecsTaskExecutionRole",
  "containerDefinitions": [
    {
      "name": "llm-fusion-mcp",
      "image": "ghcr.io/username/llm-fusion-mcp:latest",
      "essential": true,
      "logConfiguration": {
        "logDriver": "awslogs",
        "options": {
          "awslogs-group": "/ecs/llm-fusion-mcp",
          "awslogs-region": "us-east-1",
          "awslogs-stream-prefix": "ecs"
        }
      },
      "environment": [
        {"name": "GOOGLE_API_KEY", "value": "your_key"},
        {"name": "SERVER_MODE", "value": "production"}
      ]
    }
  ]
}

AWS Lambda (Serverless)

# Package for Lambda
zip -r llm-fusion-mcp-lambda.zip src/ requirements.txt

# Deploy with AWS CLI
aws lambda create-function \
  --function-name llm-fusion-mcp \
  --runtime python3.12 \
  --role arn:aws:iam::account:role/lambda-execution-role \
  --handler src.llm_fusion_mcp.lambda_handler \
  --zip-file fileb://llm-fusion-mcp-lambda.zip \
  --timeout 300 \
  --memory-size 1024

🔷 Azure Deployment

Azure Container Instances

# Deploy to Azure
az container create \
  --resource-group myResourceGroup \
  --name llm-fusion-mcp \
  --image ghcr.io/username/llm-fusion-mcp:latest \
  --cpu 2 --memory 4 \
  --restart-policy Always \
  --environment-variables \
    GOOGLE_API_KEY="your_key" \
    SERVER_MODE="production"

Azure App Service

# Deploy as Web App
az webapp create \
  --resource-group myResourceGroup \
  --plan myAppServicePlan \
  --name llm-fusion-mcp \
  --deployment-container-image-name ghcr.io/username/llm-fusion-mcp:latest

# Configure environment
az webapp config appsettings set \
  --resource-group myResourceGroup \
  --name llm-fusion-mcp \
  --settings \
    GOOGLE_API_KEY="your_key" \
    SERVER_MODE="production"

🟢 Google Cloud Deployment

Cloud Run

# Deploy to Cloud Run
gcloud run deploy llm-fusion-mcp \
  --image ghcr.io/username/llm-fusion-mcp:latest \
  --platform managed \
  --region us-central1 \
  --allow-unauthenticated \
  --set-env-vars GOOGLE_API_KEY="your_key",SERVER_MODE="production" \
  --memory 2Gi \
  --cpu 2

GKE (Kubernetes)

# kubernetes-deployment.yml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: llm-fusion-mcp
spec:
  replicas: 3
  selector:
    matchLabels:
      app: llm-fusion-mcp
  template:
    metadata:
      labels:
        app: llm-fusion-mcp
    spec:
      containers:
      - name: llm-fusion-mcp
        image: ghcr.io/username/llm-fusion-mcp:latest
        ports:
        - containerPort: 8000
        env:
        - name: GOOGLE_API_KEY
          valueFrom:
            secretKeyRef:
              name: llm-fusion-secrets
              key: google-api-key
        resources:
          requests:
            memory: "1Gi"
            cpu: "500m"
          limits:
            memory: "2Gi"
            cpu: "1000m"
---
apiVersion: v1
kind: Service
metadata:
  name: llm-fusion-mcp-service
spec:
  selector:
    app: llm-fusion-mcp
  ports:
  - protocol: TCP
    port: 80
    targetPort: 8000
  type: LoadBalancer

🏢 Enterprise Deployment

🔐 Security Hardening

1. API Key Security

# Use encrypted secrets
kubectl create secret generic llm-fusion-secrets \
  --from-literal=google-api-key="$GOOGLE_API_KEY" \
  --from-literal=openai-api-key="$OPENAI_API_KEY"

# Enable key rotation
export ENABLE_KEY_ROTATION=true
export KEY_ROTATION_INTERVAL=24

2. Network Security

# Firewall rules (example for AWS)
aws ec2 create-security-group \
  --group-name llm-fusion-mcp-sg \
  --description "LLM Fusion MCP Security Group"

# Allow only necessary ports
aws ec2 authorize-security-group-ingress \
  --group-id sg-xxxxxxx \
  --protocol tcp \
  --port 8000 \
  --source-group sg-frontend

3. Resource Limits

# Docker Compose with limits
version: '3.8'
services:
  llm-fusion-mcp:
    image: llm-fusion-mcp:latest
    deploy:
      resources:
        limits:
          cpus: '2.0'
          memory: 4G
        reservations:
          cpus: '1.0'
          memory: 2G
    restart: unless-stopped

📊 Monitoring & Observability

1. Health Checks

# Built-in health endpoint
curl http://localhost:8000/health

# Docker health check
docker run --health-cmd="curl -f http://localhost:8000/health" \
  --health-interval=30s \
  --health-retries=3 \
  --health-start-period=40s \
  --health-timeout=10s \
  llm-fusion-mcp:latest

2. Prometheus Metrics

# prometheus.yml
scrape_configs:
  - job_name: 'llm-fusion-mcp'
    static_configs:
      - targets: ['llm-fusion-mcp:9090']
    metrics_path: /metrics
    scrape_interval: 15s

3. Centralized Logging

# ELK Stack integration
docker run -d \
  --name llm-fusion-mcp \
  --log-driver=fluentd \
  --log-opt fluentd-address=localhost:24224 \
  --log-opt tag="docker.llm-fusion-mcp" \
  llm-fusion-mcp:latest

🔄 High Availability Setup

1. Load Balancing

# nginx.conf
upstream llm_fusion_backend {
    server llm-fusion-mcp-1:8000;
    server llm-fusion-mcp-2:8000;
    server llm-fusion-mcp-3:8000;
}

server {
    listen 80;
    location / {
        proxy_pass http://llm_fusion_backend;
        proxy_set_header Host $host;
        proxy_set_header X-Real-IP $remote_addr;
    }
}

2. Auto-scaling

# Kubernetes HPA
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: llm-fusion-mcp-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: llm-fusion-mcp
  minReplicas: 3
  maxReplicas: 10
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 70

🔧 Configuration Management

Environment Variables

Variable Required Default Description
GOOGLE_API_KEY - Google Gemini API key
OPENAI_API_KEY - OpenAI API key
ANTHROPIC_API_KEY - Anthropic API key
XAI_API_KEY - xAI Grok API key
SERVER_MODE production Server mode
LOG_LEVEL INFO Logging level
MAX_FILE_SIZE_MB 50 Max file size for analysis
REQUEST_TIMEOUT 300 Request timeout in seconds

Volume Mounts

# Data persistence
-v ./data:/app/data        # Persistent data
-v ./logs:/app/logs        # Log files
-v ./config:/app/config    # Configuration files
-v ./cache:/app/cache      # Model cache

🚨 Troubleshooting

Common Issues

Container Won't Start

# Check logs
docker-compose logs llm-fusion-mcp

# Common fixes
# 1. API key not configured
# 2. Port already in use
# 3. Insufficient memory

# Debug mode
docker-compose run --rm llm-fusion-mcp bash

API Connection Issues

# Test API connectivity
curl -H "Authorization: Bearer $GOOGLE_API_KEY" \
  https://generativelanguage.googleapis.com/v1beta/models

# Check firewall/network
telnet api.openai.com 443

Performance Issues

# Monitor resource usage
docker stats llm-fusion-mcp

# Scale horizontally
docker-compose up --scale llm-fusion-mcp=3

Health Checks

# Built-in health check
curl http://localhost:8000/health

# Provider status
curl http://localhost:8000/health/providers

# System metrics
curl http://localhost:8000/metrics

📞 Support

Getting Help

  • 📖 Documentation: Check README.md and INTEGRATION.md
  • 🧪 Testing: Run health checks and test suite
  • 🔍 Debugging: Enable DEBUG log level
  • 📊 Monitoring: Check metrics and logs

Performance Tuning

  • Memory: Increase container memory for large file processing
  • CPU: Scale horizontally for high throughput
  • Cache: Tune model cache timeout for your usage patterns
  • Network: Use CDN for static assets, optimize API endpoints

🎉 Ready for Production!

Your LLM Fusion MCP server is now deployed and ready to handle production workloads!

Built with ❤️ for enterprise-grade AI integration