Ryan Malloy c335ba0e1e Initial commit: LLM Fusion MCP Server

- Unified access to 4 major LLM providers (Gemini, OpenAI, Anthropic, Grok)
- Real-time streaming support across all providers
- Multimodal capabilities (text, images, audio)
- Intelligent document processing with smart chunking
- Production-ready with health monitoring and error handling
- Full OpenAI ecosystem integration (Assistants, DALL-E, Whisper)
- Vector embeddings and semantic similarity
- Session-based API key management
- Built with FastMCP and modern Python tooling

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>

2025-09-05 05:47:51 -06:00

4.1 KiB

Raw Permalink Blame History

LLM Fusion MCP - Requirements & Preferences

This document captures the specific requirements and preferences for the LLM Fusion MCP project.

Core Requirements

Python Project Setup

Package Management: Use uv for dependency management
Project Structure: Modern Python packaging with pyproject.toml
Code Quality: Use ruff for formatting and linting
MCP Framework: Use fastmcp (latest version 2.11.3+)

API Integration

LLM Provider: Google Gemini API
API Approach: Use OpenAI-compatible API endpoint instead of native Google libraries
- Base URL: https://generativelanguage.googleapis.com/v1beta/openai/
- Rationale: "so we can code for many type of llms" - enables easy switching between LLM providers
Library: Use openai library instead of google-generativeai for better compatibility

Streaming Requirements

Always Use Streaming: "I Want to use 'streaming responses' always"
Implementation: All text generation should support real-time streaming responses
Format: Token-by-token streaming with incremental content delivery

Image Understanding

Multimodal Support: Support image analysis and understanding
Implementation: Use OpenAI-compatible multimodal API
Format: Base64 encoded images with data URLs

Example provided:

# Function to encode the image
def encode_image(image_path):
  with open(image_path, "rb") as image_file:
    return base64.b64encode(image_file.read()).decode('utf-8')

# Usage with data URL format
"url": f"data:image/jpeg;base64,{base64_image}"

Simple MCP Tools

Request: "let's setup a simple mcp tool"
Implementation: Include basic utility tools alongside AI capabilities
Example: Calculator tool for mathematical operations

Function Calling Support

Request: "let's also add basic 'function calling support'"
Implementation: Support for OpenAI-compatible function calling
Features: Tool definitions, automatic function execution, streaming support
Example: Weather function with location and unit parameters

Audio Understanding

Request: "and audio understanding"
Implementation: Base64 encoded audio with input_audio content type
Supported Formats: WAV, MP3, and other audio formats
Use Cases: Transcription, audio analysis, voice commands

Text Embeddings

Request: "we can also do text embeddings"
Implementation: OpenAI-compatible embeddings API
Model: gemini-embedding-001
Features: Single text or batch processing, similarity calculations

Advanced Features (extra_body)

Request: Support for Gemini-specific features via extra_body
Cached Content: Use pre-cached content for faster responses
Thinking Config: Enable reasoning mode for complex problems
Implementation: Custom extra_body parameter handling

Technical Specifications

Dependencies

fastmcp>=2.11.3 - MCP server framework
openai>=1.54.0 - OpenAI-compatible API client
python-dotenv>=1.0.0 - Environment variable management
pydantic>=2.11.7 - Structured outputs and data validation

Environment Configuration

GOOGLE_API_KEY=<your_api_key>
GEMINI_MODEL=gemini-1.5-flash
ENABLE_STREAMING=true

Supported Models

Text: gemini-1.5-flash (default), gemini-2.5-flash, gemini-2.5-pro
Vision: gemini-2.0-flash (for image analysis)
Embeddings: gemini-embedding-001, gemini-embedding-exp-03-07
Thinking: gemini-2.5-flash (with reasoning_effort parameter)

Implementation Approach

Streaming Architecture

Primary functions return generators for streaming
Fallback functions collect streams for non-streaming clients
Real-time token delivery with progress tracking

Multimodal Design

Support multiple image formats (JPG, JPEG, PNG)
Automatic format detection and encoding
Structured message format with text + image content

Error Handling

Comprehensive try-catch blocks
Structured error responses
Success/failure status indicators

API Key Security

Store in .env file (gitignored)
Provide .env.example template
Load via python-dotenv

4.1 KiB Raw Permalink Blame History

LLM Fusion MCP - Requirements & Preferences

Core Requirements

Python Project Setup

API Integration

Streaming Requirements

Image Understanding

Simple MCP Tools

Function Calling Support

Audio Understanding

Text Embeddings

Advanced Features (extra_body)

Technical Specifications

Dependencies

Environment Configuration

Supported Models

Implementation Approach

Streaming Architecture

Multimodal Design

Error Handling

API Key Security

4.1 KiB

Raw Permalink Blame History