forked from MCP/llm-fusion-mcp
- Unified access to 4 major LLM providers (Gemini, OpenAI, Anthropic, Grok) - Real-time streaming support across all providers - Multimodal capabilities (text, images, audio) - Intelligent document processing with smart chunking - Production-ready with health monitoring and error handling - Full OpenAI ecosystem integration (Assistants, DALL-E, Whisper) - Vector embeddings and semantic similarity - Session-based API key management - Built with FastMCP and modern Python tooling 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com>
110 lines
4.1 KiB
Markdown
110 lines
4.1 KiB
Markdown
# LLM Fusion MCP - Requirements & Preferences
|
|
|
|
This document captures the specific requirements and preferences for the LLM Fusion MCP project.
|
|
|
|
## Core Requirements
|
|
|
|
### Python Project Setup
|
|
- **Package Management**: Use `uv` for dependency management
|
|
- **Project Structure**: Modern Python packaging with `pyproject.toml`
|
|
- **Code Quality**: Use `ruff` for formatting and linting
|
|
- **MCP Framework**: Use `fastmcp` (latest version 2.11.3+)
|
|
|
|
### API Integration
|
|
- **LLM Provider**: Google Gemini API
|
|
- **API Approach**: Use OpenAI-compatible API endpoint instead of native Google libraries
|
|
- Base URL: `https://generativelanguage.googleapis.com/v1beta/openai/`
|
|
- Rationale: "so we can code for many type of llms" - enables easy switching between LLM providers
|
|
- **Library**: Use `openai` library instead of `google-generativeai` for better compatibility
|
|
|
|
### Streaming Requirements
|
|
- **Always Use Streaming**: "I Want to use 'streaming responses' always"
|
|
- **Implementation**: All text generation should support real-time streaming responses
|
|
- **Format**: Token-by-token streaming with incremental content delivery
|
|
|
|
### Image Understanding
|
|
- **Multimodal Support**: Support image analysis and understanding
|
|
- **Implementation**: Use OpenAI-compatible multimodal API
|
|
- **Format**: Base64 encoded images with data URLs
|
|
- **Example provided**:
|
|
```python
|
|
# Function to encode the image
|
|
def encode_image(image_path):
|
|
with open(image_path, "rb") as image_file:
|
|
return base64.b64encode(image_file.read()).decode('utf-8')
|
|
|
|
# Usage with data URL format
|
|
"url": f"data:image/jpeg;base64,{base64_image}"
|
|
```
|
|
|
|
### Simple MCP Tools
|
|
- **Request**: "let's setup a simple mcp tool"
|
|
- **Implementation**: Include basic utility tools alongside AI capabilities
|
|
- **Example**: Calculator tool for mathematical operations
|
|
|
|
### Function Calling Support
|
|
- **Request**: "let's also add basic 'function calling support'"
|
|
- **Implementation**: Support for OpenAI-compatible function calling
|
|
- **Features**: Tool definitions, automatic function execution, streaming support
|
|
- **Example**: Weather function with location and unit parameters
|
|
|
|
### Audio Understanding
|
|
- **Request**: "and audio understanding"
|
|
- **Implementation**: Base64 encoded audio with `input_audio` content type
|
|
- **Supported Formats**: WAV, MP3, and other audio formats
|
|
- **Use Cases**: Transcription, audio analysis, voice commands
|
|
|
|
### Text Embeddings
|
|
- **Request**: "we can also do text embeddings"
|
|
- **Implementation**: OpenAI-compatible embeddings API
|
|
- **Model**: `gemini-embedding-001`
|
|
- **Features**: Single text or batch processing, similarity calculations
|
|
|
|
### Advanced Features (extra_body)
|
|
- **Request**: Support for Gemini-specific features via `extra_body`
|
|
- **Cached Content**: Use pre-cached content for faster responses
|
|
- **Thinking Config**: Enable reasoning mode for complex problems
|
|
- **Implementation**: Custom extra_body parameter handling
|
|
|
|
## Technical Specifications
|
|
|
|
### Dependencies
|
|
- `fastmcp>=2.11.3` - MCP server framework
|
|
- `openai>=1.54.0` - OpenAI-compatible API client
|
|
- `python-dotenv>=1.0.0` - Environment variable management
|
|
- `pydantic>=2.11.7` - Structured outputs and data validation
|
|
|
|
### Environment Configuration
|
|
```env
|
|
GOOGLE_API_KEY=<your_api_key>
|
|
GEMINI_MODEL=gemini-1.5-flash
|
|
ENABLE_STREAMING=true
|
|
```
|
|
|
|
### Supported Models
|
|
- **Text**: `gemini-1.5-flash` (default), `gemini-2.5-flash`, `gemini-2.5-pro`
|
|
- **Vision**: `gemini-2.0-flash` (for image analysis)
|
|
- **Embeddings**: `gemini-embedding-001`, `gemini-embedding-exp-03-07`
|
|
- **Thinking**: `gemini-2.5-flash` (with reasoning_effort parameter)
|
|
|
|
## Implementation Approach
|
|
|
|
### Streaming Architecture
|
|
- Primary functions return generators for streaming
|
|
- Fallback functions collect streams for non-streaming clients
|
|
- Real-time token delivery with progress tracking
|
|
|
|
### Multimodal Design
|
|
- Support multiple image formats (JPG, JPEG, PNG)
|
|
- Automatic format detection and encoding
|
|
- Structured message format with text + image content
|
|
|
|
### Error Handling
|
|
- Comprehensive try-catch blocks
|
|
- Structured error responses
|
|
- Success/failure status indicators
|
|
|
|
## API Key Security
|
|
- Store in `.env` file (gitignored)
|
|
- Provide `.env.example` template
|
|
- Load via `python-dotenv` |