forked from MCP/llm-fusion-mcp
- Unified access to 4 major LLM providers (Gemini, OpenAI, Anthropic, Grok) - Real-time streaming support across all providers - Multimodal capabilities (text, images, audio) - Intelligent document processing with smart chunking - Production-ready with health monitoring and error handling - Full OpenAI ecosystem integration (Assistants, DALL-E, Whisper) - Vector embeddings and semantic similarity - Session-based API key management - Built with FastMCP and modern Python tooling 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com>
4.1 KiB
4.1 KiB
LLM Fusion MCP - Requirements & Preferences
This document captures the specific requirements and preferences for the LLM Fusion MCP project.
Core Requirements
Python Project Setup
- Package Management: Use
uv
for dependency management - Project Structure: Modern Python packaging with
pyproject.toml
- Code Quality: Use
ruff
for formatting and linting - MCP Framework: Use
fastmcp
(latest version 2.11.3+)
API Integration
- LLM Provider: Google Gemini API
- API Approach: Use OpenAI-compatible API endpoint instead of native Google libraries
- Base URL:
https://generativelanguage.googleapis.com/v1beta/openai/
- Rationale: "so we can code for many type of llms" - enables easy switching between LLM providers
- Base URL:
- Library: Use
openai
library instead ofgoogle-generativeai
for better compatibility
Streaming Requirements
- Always Use Streaming: "I Want to use 'streaming responses' always"
- Implementation: All text generation should support real-time streaming responses
- Format: Token-by-token streaming with incremental content delivery
Image Understanding
- Multimodal Support: Support image analysis and understanding
- Implementation: Use OpenAI-compatible multimodal API
- Format: Base64 encoded images with data URLs
- Example provided:
# Function to encode the image def encode_image(image_path): with open(image_path, "rb") as image_file: return base64.b64encode(image_file.read()).decode('utf-8') # Usage with data URL format "url": f"data:image/jpeg;base64,{base64_image}"
Simple MCP Tools
- Request: "let's setup a simple mcp tool"
- Implementation: Include basic utility tools alongside AI capabilities
- Example: Calculator tool for mathematical operations
Function Calling Support
- Request: "let's also add basic 'function calling support'"
- Implementation: Support for OpenAI-compatible function calling
- Features: Tool definitions, automatic function execution, streaming support
- Example: Weather function with location and unit parameters
Audio Understanding
- Request: "and audio understanding"
- Implementation: Base64 encoded audio with
input_audio
content type - Supported Formats: WAV, MP3, and other audio formats
- Use Cases: Transcription, audio analysis, voice commands
Text Embeddings
- Request: "we can also do text embeddings"
- Implementation: OpenAI-compatible embeddings API
- Model:
gemini-embedding-001
- Features: Single text or batch processing, similarity calculations
Advanced Features (extra_body)
- Request: Support for Gemini-specific features via
extra_body
- Cached Content: Use pre-cached content for faster responses
- Thinking Config: Enable reasoning mode for complex problems
- Implementation: Custom extra_body parameter handling
Technical Specifications
Dependencies
fastmcp>=2.11.3
- MCP server frameworkopenai>=1.54.0
- OpenAI-compatible API clientpython-dotenv>=1.0.0
- Environment variable managementpydantic>=2.11.7
- Structured outputs and data validation
Environment Configuration
GOOGLE_API_KEY=<your_api_key>
GEMINI_MODEL=gemini-1.5-flash
ENABLE_STREAMING=true
Supported Models
- Text:
gemini-1.5-flash
(default),gemini-2.5-flash
,gemini-2.5-pro
- Vision:
gemini-2.0-flash
(for image analysis) - Embeddings:
gemini-embedding-001
,gemini-embedding-exp-03-07
- Thinking:
gemini-2.5-flash
(with reasoning_effort parameter)
Implementation Approach
Streaming Architecture
- Primary functions return generators for streaming
- Fallback functions collect streams for non-streaming clients
- Real-time token delivery with progress tracking
Multimodal Design
- Support multiple image formats (JPG, JPEG, PNG)
- Automatic format detection and encoding
- Structured message format with text + image content
Error Handling
- Comprehensive try-catch blocks
- Structured error responses
- Success/failure status indicators
API Key Security
- Store in
.env
file (gitignored) - Provide
.env.example
template - Load via
python-dotenv