llm-fusion-mcp/REQUIREMENTS.md
Ryan Malloy c335ba0e1e
Some checks are pending
🚀 LLM Fusion MCP - CI/CD Pipeline / 🔍 Code Quality & Testing (3.10) (push) Waiting to run
🚀 LLM Fusion MCP - CI/CD Pipeline / 🔍 Code Quality & Testing (3.11) (push) Waiting to run
🚀 LLM Fusion MCP - CI/CD Pipeline / 🔍 Code Quality & Testing (3.12) (push) Waiting to run
🚀 LLM Fusion MCP - CI/CD Pipeline / 🛡️ Security Scanning (push) Blocked by required conditions
🚀 LLM Fusion MCP - CI/CD Pipeline / 🐳 Docker Build & Push (push) Blocked by required conditions
🚀 LLM Fusion MCP - CI/CD Pipeline / 🎉 Create Release (push) Blocked by required conditions
🚀 LLM Fusion MCP - CI/CD Pipeline / 📢 Deployment Notification (push) Blocked by required conditions
Initial commit: LLM Fusion MCP Server
- Unified access to 4 major LLM providers (Gemini, OpenAI, Anthropic, Grok)
- Real-time streaming support across all providers
- Multimodal capabilities (text, images, audio)
- Intelligent document processing with smart chunking
- Production-ready with health monitoring and error handling
- Full OpenAI ecosystem integration (Assistants, DALL-E, Whisper)
- Vector embeddings and semantic similarity
- Session-based API key management
- Built with FastMCP and modern Python tooling

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-09-05 05:47:51 -06:00

4.1 KiB

LLM Fusion MCP - Requirements & Preferences

This document captures the specific requirements and preferences for the LLM Fusion MCP project.

Core Requirements

Python Project Setup

  • Package Management: Use uv for dependency management
  • Project Structure: Modern Python packaging with pyproject.toml
  • Code Quality: Use ruff for formatting and linting
  • MCP Framework: Use fastmcp (latest version 2.11.3+)

API Integration

  • LLM Provider: Google Gemini API
  • API Approach: Use OpenAI-compatible API endpoint instead of native Google libraries
    • Base URL: https://generativelanguage.googleapis.com/v1beta/openai/
    • Rationale: "so we can code for many type of llms" - enables easy switching between LLM providers
  • Library: Use openai library instead of google-generativeai for better compatibility

Streaming Requirements

  • Always Use Streaming: "I Want to use 'streaming responses' always"
  • Implementation: All text generation should support real-time streaming responses
  • Format: Token-by-token streaming with incremental content delivery

Image Understanding

  • Multimodal Support: Support image analysis and understanding
  • Implementation: Use OpenAI-compatible multimodal API
  • Format: Base64 encoded images with data URLs
  • Example provided:
    # Function to encode the image
    def encode_image(image_path):
      with open(image_path, "rb") as image_file:
        return base64.b64encode(image_file.read()).decode('utf-8')
    
    # Usage with data URL format
    "url": f"data:image/jpeg;base64,{base64_image}"
    

Simple MCP Tools

  • Request: "let's setup a simple mcp tool"
  • Implementation: Include basic utility tools alongside AI capabilities
  • Example: Calculator tool for mathematical operations

Function Calling Support

  • Request: "let's also add basic 'function calling support'"
  • Implementation: Support for OpenAI-compatible function calling
  • Features: Tool definitions, automatic function execution, streaming support
  • Example: Weather function with location and unit parameters

Audio Understanding

  • Request: "and audio understanding"
  • Implementation: Base64 encoded audio with input_audio content type
  • Supported Formats: WAV, MP3, and other audio formats
  • Use Cases: Transcription, audio analysis, voice commands

Text Embeddings

  • Request: "we can also do text embeddings"
  • Implementation: OpenAI-compatible embeddings API
  • Model: gemini-embedding-001
  • Features: Single text or batch processing, similarity calculations

Advanced Features (extra_body)

  • Request: Support for Gemini-specific features via extra_body
  • Cached Content: Use pre-cached content for faster responses
  • Thinking Config: Enable reasoning mode for complex problems
  • Implementation: Custom extra_body parameter handling

Technical Specifications

Dependencies

  • fastmcp>=2.11.3 - MCP server framework
  • openai>=1.54.0 - OpenAI-compatible API client
  • python-dotenv>=1.0.0 - Environment variable management
  • pydantic>=2.11.7 - Structured outputs and data validation

Environment Configuration

GOOGLE_API_KEY=<your_api_key>
GEMINI_MODEL=gemini-1.5-flash
ENABLE_STREAMING=true

Supported Models

  • Text: gemini-1.5-flash (default), gemini-2.5-flash, gemini-2.5-pro
  • Vision: gemini-2.0-flash (for image analysis)
  • Embeddings: gemini-embedding-001, gemini-embedding-exp-03-07
  • Thinking: gemini-2.5-flash (with reasoning_effort parameter)

Implementation Approach

Streaming Architecture

  • Primary functions return generators for streaming
  • Fallback functions collect streams for non-streaming clients
  • Real-time token delivery with progress tracking

Multimodal Design

  • Support multiple image formats (JPG, JPEG, PNG)
  • Automatic format detection and encoding
  • Structured message format with text + image content

Error Handling

  • Comprehensive try-catch blocks
  • Structured error responses
  • Success/failure status indicators

API Key Security

  • Store in .env file (gitignored)
  • Provide .env.example template
  • Load via python-dotenv