1
0
forked from MCP/llm-fusion-mcp
llm-fusion-mcp/REQUIREMENTS.md
Ryan Malloy c335ba0e1e Initial commit: LLM Fusion MCP Server
- Unified access to 4 major LLM providers (Gemini, OpenAI, Anthropic, Grok)
- Real-time streaming support across all providers
- Multimodal capabilities (text, images, audio)
- Intelligent document processing with smart chunking
- Production-ready with health monitoring and error handling
- Full OpenAI ecosystem integration (Assistants, DALL-E, Whisper)
- Vector embeddings and semantic similarity
- Session-based API key management
- Built with FastMCP and modern Python tooling

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-09-05 05:47:51 -06:00

110 lines
4.1 KiB
Markdown

# LLM Fusion MCP - Requirements & Preferences
This document captures the specific requirements and preferences for the LLM Fusion MCP project.
## Core Requirements
### Python Project Setup
- **Package Management**: Use `uv` for dependency management
- **Project Structure**: Modern Python packaging with `pyproject.toml`
- **Code Quality**: Use `ruff` for formatting and linting
- **MCP Framework**: Use `fastmcp` (latest version 2.11.3+)
### API Integration
- **LLM Provider**: Google Gemini API
- **API Approach**: Use OpenAI-compatible API endpoint instead of native Google libraries
- Base URL: `https://generativelanguage.googleapis.com/v1beta/openai/`
- Rationale: "so we can code for many type of llms" - enables easy switching between LLM providers
- **Library**: Use `openai` library instead of `google-generativeai` for better compatibility
### Streaming Requirements
- **Always Use Streaming**: "I Want to use 'streaming responses' always"
- **Implementation**: All text generation should support real-time streaming responses
- **Format**: Token-by-token streaming with incremental content delivery
### Image Understanding
- **Multimodal Support**: Support image analysis and understanding
- **Implementation**: Use OpenAI-compatible multimodal API
- **Format**: Base64 encoded images with data URLs
- **Example provided**:
```python
# Function to encode the image
def encode_image(image_path):
with open(image_path, "rb") as image_file:
return base64.b64encode(image_file.read()).decode('utf-8')
# Usage with data URL format
"url": f"data:image/jpeg;base64,{base64_image}"
```
### Simple MCP Tools
- **Request**: "let's setup a simple mcp tool"
- **Implementation**: Include basic utility tools alongside AI capabilities
- **Example**: Calculator tool for mathematical operations
### Function Calling Support
- **Request**: "let's also add basic 'function calling support'"
- **Implementation**: Support for OpenAI-compatible function calling
- **Features**: Tool definitions, automatic function execution, streaming support
- **Example**: Weather function with location and unit parameters
### Audio Understanding
- **Request**: "and audio understanding"
- **Implementation**: Base64 encoded audio with `input_audio` content type
- **Supported Formats**: WAV, MP3, and other audio formats
- **Use Cases**: Transcription, audio analysis, voice commands
### Text Embeddings
- **Request**: "we can also do text embeddings"
- **Implementation**: OpenAI-compatible embeddings API
- **Model**: `gemini-embedding-001`
- **Features**: Single text or batch processing, similarity calculations
### Advanced Features (extra_body)
- **Request**: Support for Gemini-specific features via `extra_body`
- **Cached Content**: Use pre-cached content for faster responses
- **Thinking Config**: Enable reasoning mode for complex problems
- **Implementation**: Custom extra_body parameter handling
## Technical Specifications
### Dependencies
- `fastmcp>=2.11.3` - MCP server framework
- `openai>=1.54.0` - OpenAI-compatible API client
- `python-dotenv>=1.0.0` - Environment variable management
- `pydantic>=2.11.7` - Structured outputs and data validation
### Environment Configuration
```env
GOOGLE_API_KEY=<your_api_key>
GEMINI_MODEL=gemini-1.5-flash
ENABLE_STREAMING=true
```
### Supported Models
- **Text**: `gemini-1.5-flash` (default), `gemini-2.5-flash`, `gemini-2.5-pro`
- **Vision**: `gemini-2.0-flash` (for image analysis)
- **Embeddings**: `gemini-embedding-001`, `gemini-embedding-exp-03-07`
- **Thinking**: `gemini-2.5-flash` (with reasoning_effort parameter)
## Implementation Approach
### Streaming Architecture
- Primary functions return generators for streaming
- Fallback functions collect streams for non-streaming clients
- Real-time token delivery with progress tracking
### Multimodal Design
- Support multiple image formats (JPG, JPEG, PNG)
- Automatic format detection and encoding
- Structured message format with text + image content
### Error Handling
- Comprehensive try-catch blocks
- Structured error responses
- Success/failure status indicators
## API Key Security
- Store in `.env` file (gitignored)
- Provide `.env.example` template
- Load via `python-dotenv`