llm-fusion-mcp/REQUIREMENTS.md

# LLM Fusion MCP - Requirements & Preferences

This document captures the specific requirements and preferences for the LLM Fusion MCP project.

## Core Requirements

### Python Project Setup
- **Package Management**: Use `uv` for dependency management
- **Project Structure**: Modern Python packaging with `pyproject.toml`
- **Code Quality**: Use `ruff` for formatting and linting
- **MCP Framework**: Use `fastmcp` (latest version 2.11.3+)

### API Integration
- **LLM Provider**: Google Gemini API
- **API Approach**: Use OpenAI-compatible API endpoint instead of native Google libraries
  - Base URL: `https://generativelanguage.googleapis.com/v1beta/openai/`
  - Rationale: "so we can code for many type of llms" - enables easy switching between LLM providers
- **Library**: Use `openai` library instead of `google-generativeai` for better compatibility

### Streaming Requirements
- **Always Use Streaming**: "I Want to use 'streaming responses' always"
- **Implementation**: All text generation should support real-time streaming responses
- **Format**: Token-by-token streaming with incremental content delivery

### Image Understanding
- **Multimodal Support**: Support image analysis and understanding
- **Implementation**: Use OpenAI-compatible multimodal API
- **Format**: Base64 encoded images with data URLs
- **Example provided**:
  ```python
  # Function to encode the image
  def encode_image(image_path):
    with open(image_path, "rb") as image_file:
      return base64.b64encode(image_file.read()).decode('utf-8')

  # Usage with data URL format
  "url": f"data:image/jpeg;base64,{base64_image}"
  ```

### Simple MCP Tools
- **Request**: "let's setup a simple mcp tool"
- **Implementation**: Include basic utility tools alongside AI capabilities
- **Example**: Calculator tool for mathematical operations

### Function Calling Support
- **Request**: "let's also add basic 'function calling support'"
- **Implementation**: Support for OpenAI-compatible function calling
- **Features**: Tool definitions, automatic function execution, streaming support
- **Example**: Weather function with location and unit parameters

### Audio Understanding
- **Request**: "and audio understanding"
- **Implementation**: Base64 encoded audio with `input_audio` content type
- **Supported Formats**: WAV, MP3, and other audio formats
- **Use Cases**: Transcription, audio analysis, voice commands

### Text Embeddings
- **Request**: "we can also do text embeddings"
- **Implementation**: OpenAI-compatible embeddings API
- **Model**: `gemini-embedding-001`
- **Features**: Single text or batch processing, similarity calculations

### Advanced Features (extra_body)
- **Request**: Support for Gemini-specific features via `extra_body`
- **Cached Content**: Use pre-cached content for faster responses
- **Thinking Config**: Enable reasoning mode for complex problems
- **Implementation**: Custom extra_body parameter handling

## Technical Specifications

### Dependencies
- `fastmcp>=2.11.3` - MCP server framework
- `openai>=1.54.0` - OpenAI-compatible API client
- `python-dotenv>=1.0.0` - Environment variable management
- `pydantic>=2.11.7` - Structured outputs and data validation

### Environment Configuration
```env
GOOGLE_API_KEY=<your_api_key>
GEMINI_MODEL=gemini-1.5-flash
ENABLE_STREAMING=true
```

### Supported Models
- **Text**: `gemini-1.5-flash` (default), `gemini-2.5-flash`, `gemini-2.5-pro`
- **Vision**: `gemini-2.0-flash` (for image analysis)
- **Embeddings**: `gemini-embedding-001`, `gemini-embedding-exp-03-07`
- **Thinking**: `gemini-2.5-flash` (with reasoning_effort parameter)

## Implementation Approach

### Streaming Architecture
- Primary functions return generators for streaming
- Fallback functions collect streams for non-streaming clients
- Real-time token delivery with progress tracking

### Multimodal Design
- Support multiple image formats (JPG, JPEG, PNG)
- Automatic format detection and encoding
- Structured message format with text + image content

### Error Handling
- Comprehensive try-catch blocks
- Structured error responses
- Success/failure status indicators

## API Key Security
- Store in `.env` file (gitignored)
- Provide `.env.example` template
- Load via `python-dotenv`