# LLM Fusion MCP - Requirements & Preferences This document captures the specific requirements and preferences for the LLM Fusion MCP project. ## Core Requirements ### Python Project Setup - **Package Management**: Use `uv` for dependency management - **Project Structure**: Modern Python packaging with `pyproject.toml` - **Code Quality**: Use `ruff` for formatting and linting - **MCP Framework**: Use `fastmcp` (latest version 2.11.3+) ### API Integration - **LLM Provider**: Google Gemini API - **API Approach**: Use OpenAI-compatible API endpoint instead of native Google libraries - Base URL: `https://generativelanguage.googleapis.com/v1beta/openai/` - Rationale: "so we can code for many type of llms" - enables easy switching between LLM providers - **Library**: Use `openai` library instead of `google-generativeai` for better compatibility ### Streaming Requirements - **Always Use Streaming**: "I Want to use 'streaming responses' always" - **Implementation**: All text generation should support real-time streaming responses - **Format**: Token-by-token streaming with incremental content delivery ### Image Understanding - **Multimodal Support**: Support image analysis and understanding - **Implementation**: Use OpenAI-compatible multimodal API - **Format**: Base64 encoded images with data URLs - **Example provided**: ```python # Function to encode the image def encode_image(image_path): with open(image_path, "rb") as image_file: return base64.b64encode(image_file.read()).decode('utf-8') # Usage with data URL format "url": f"data:image/jpeg;base64,{base64_image}" ``` ### Simple MCP Tools - **Request**: "let's setup a simple mcp tool" - **Implementation**: Include basic utility tools alongside AI capabilities - **Example**: Calculator tool for mathematical operations ### Function Calling Support - **Request**: "let's also add basic 'function calling support'" - **Implementation**: Support for OpenAI-compatible function calling - **Features**: Tool definitions, automatic function execution, streaming support - **Example**: Weather function with location and unit parameters ### Audio Understanding - **Request**: "and audio understanding" - **Implementation**: Base64 encoded audio with `input_audio` content type - **Supported Formats**: WAV, MP3, and other audio formats - **Use Cases**: Transcription, audio analysis, voice commands ### Text Embeddings - **Request**: "we can also do text embeddings" - **Implementation**: OpenAI-compatible embeddings API - **Model**: `gemini-embedding-001` - **Features**: Single text or batch processing, similarity calculations ### Advanced Features (extra_body) - **Request**: Support for Gemini-specific features via `extra_body` - **Cached Content**: Use pre-cached content for faster responses - **Thinking Config**: Enable reasoning mode for complex problems - **Implementation**: Custom extra_body parameter handling ## Technical Specifications ### Dependencies - `fastmcp>=2.11.3` - MCP server framework - `openai>=1.54.0` - OpenAI-compatible API client - `python-dotenv>=1.0.0` - Environment variable management - `pydantic>=2.11.7` - Structured outputs and data validation ### Environment Configuration ```env GOOGLE_API_KEY= GEMINI_MODEL=gemini-1.5-flash ENABLE_STREAMING=true ``` ### Supported Models - **Text**: `gemini-1.5-flash` (default), `gemini-2.5-flash`, `gemini-2.5-pro` - **Vision**: `gemini-2.0-flash` (for image analysis) - **Embeddings**: `gemini-embedding-001`, `gemini-embedding-exp-03-07` - **Thinking**: `gemini-2.5-flash` (with reasoning_effort parameter) ## Implementation Approach ### Streaming Architecture - Primary functions return generators for streaming - Fallback functions collect streams for non-streaming clients - Real-time token delivery with progress tracking ### Multimodal Design - Support multiple image formats (JPG, JPEG, PNG) - Automatic format detection and encoding - Structured message format with text + image content ### Error Handling - Comprehensive try-catch blocks - Structured error responses - Success/failure status indicators ## API Key Security - Store in `.env` file (gitignored) - Provide `.env.example` template - Load via `python-dotenv`