Ryan Malloy d1bb9cbf56 🧠 Initial commit: Ultimate Memory MCP Server with Multi-Provider Support

🚀 Features:
- FastMCP 2.8.1+ integration with modern Python 3.11+ features
- Kuzu graph database for intelligent memory relationships
- Multi-provider embedding support (OpenAI, Ollama, Sentence Transformers)
- Automatic relationship detection via semantic similarity
- Graph traversal for connected memory discovery
- 8 MCP tools for comprehensive memory operations

🦙 Self-Hosted Focus:
- Ollama provider for complete privacy and control
- Zero external dependencies for sacred trust applications
- Production-ready with comprehensive testing
- Interactive setup script with provider selection

📦 Complete Package:
- memory_mcp_server.py (1,010 lines) - Main FastMCP server
- Comprehensive test suite and examples
- Detailed documentation including Ollama setup guide
- MCP client configuration examples
- Interactive setup script

🎯 Perfect for LLM memory systems requiring:
- Privacy-first architecture
- Intelligent relationship modeling
- Graph-based memory exploration
- Self-hosted deployment capabilities

2025-06-23 22:34:12 -06:00

6.0 KiB

Raw Permalink Blame History

Ollama Setup Guide for Ultimate Memory MCP Server

This guide will help you set up Ollama as your embedding provider for completely self-hosted, private memory operations.

🦙 Why Ollama?

100% Free - No API costs or usage limits
Privacy First - All processing happens locally
High Quality - nomic-embed-text performs excellently
Self-Contained - No external dependencies once set up

📋 Quick Setup Checklist

1. Install Ollama

# Linux/macOS
curl -fsSL https://ollama.ai/install.sh | sh

# Or download from https://ollama.ai/download

2. Start Ollama Server

ollama serve
# Keep this running in a terminal or run as a service

3. Pull Required Models

# Essential: Embedding model
ollama pull nomic-embed-text

# Optional: Small chat model for summaries
ollama pull llama3.2:1b

# Check installed models
ollama list

4. Configure Memory Server

# In your .env file:
EMBEDDING_PROVIDER=ollama
OLLAMA_BASE_URL=http://localhost:11434
OLLAMA_EMBEDDING_MODEL=nomic-embed-text

5. Test Setup

python test_server.py --ollama-setup

🔧 Advanced Configuration

Custom Ollama Host

# Remote Ollama server
OLLAMA_BASE_URL=http://192.168.1.100:11434

# Different port
OLLAMA_BASE_URL=http://localhost:8080

Alternative Embedding Models

# Try different embedding models
ollama pull mxbai-embed-large
ollama pull all-minilm

# Update .env to use different model
OLLAMA_EMBEDDING_MODEL=mxbai-embed-large

Model Performance Comparison

Model	Size	Quality	Speed	Memory
nomic-embed-text	274MB	⭐⭐⭐⭐	⭐⭐⭐⭐	1.5GB
mxbai-embed-large	669MB	⭐⭐⭐⭐⭐	⭐⭐⭐	2.5GB
all-minilm	23MB	⭐⭐⭐	⭐⭐⭐⭐⭐	512MB

🚀 Running as a Service

Linux (systemd)

Create /etc/systemd/system/ollama.service:

[Unit]
Description=Ollama Server
After=network-online.target

[Service]
ExecStart=/usr/local/bin/ollama serve
User=ollama
Group=ollama
Restart=always
RestartSec=3
Environment="PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin"
Environment="OLLAMA_HOST=0.0.0.0"

[Install]
WantedBy=default.target

sudo systemctl daemon-reload
sudo systemctl enable ollama
sudo systemctl start ollama

macOS (LaunchDaemon)

Create ~/Library/LaunchAgents/com.ollama.server.plist:

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN" "http://www.apple.com/DTDs/PropertyList-1.0.dtd">
<plist version="1.0">
<dict>
    <key>Label</key>
    <string>com.ollama.server</string>
    <key>ProgramArguments</key>
    <array>
        <string>/usr/local/bin/ollama</string>
        <string>serve</string>
    </array>
    <key>RunAtLoad</key>
    <true/>
    <key>KeepAlive</key>
    <true/>
</dict>
</plist>

launchctl load ~/Library/LaunchAgents/com.ollama.server.plist

🧪 Testing & Verification

Test Ollama Connection

# Check server status
curl http://localhost:11434/api/tags

# Test embedding generation
curl http://localhost:11434/api/embeddings \
  -d '{"model": "nomic-embed-text", "prompt": "test"}'

Test with Memory Server

# Test Ollama-specific functionality
python test_server.py --ollama-setup

# Test full memory operations
EMBEDDING_PROVIDER=ollama python test_server.py

Performance Benchmarks

# Time embedding generation
time curl -s http://localhost:11434/api/embeddings \
  -d '{"model": "nomic-embed-text", "prompt": "performance test"}' \
  > /dev/null

🔧 Troubleshooting

Common Issues

"Connection refused"

# Check if Ollama is running
ps aux | grep ollama

# Start if not running
ollama serve

"Model not found"

# List available models
ollama list

# Pull missing model
ollama pull nomic-embed-text

Slow performance

# Check system resources
htop

# Consider smaller model
ollama pull all-minilm

Out of memory

# Use smaller model
ollama pull all-minilm

# Or increase swap space
sudo swapon --show

Performance Optimization

Hardware Requirements
- Minimum: 4GB RAM, 2 CPU cores
- Recommended: 8GB RAM, 4 CPU cores
- Storage: 2GB for models
Model Selection
- Development: all-minilm (fast, small)
- Production: nomic-embed-text (balanced)
- High Quality: mxbai-embed-large (slow, accurate)

Concurrent Requests

# Ollama handles concurrency automatically
# No additional configuration needed

📊 Monitoring

Check Ollama Logs

# If running as service
journalctl -u ollama -f

# If running manually
# Logs appear in the terminal where you ran 'ollama serve'

Monitor Resource Usage

# CPU and memory usage
htop

# Disk usage for models
du -sh ~/.ollama/models/

API Health Check

# Simple health check
curl -f http://localhost:11434/api/tags && echo "✅ Ollama OK" || echo "❌ Ollama Error"

🔄 Switching Between Providers

You can easily switch between providers by changing your .env file:

# Switch to Ollama
echo "EMBEDDING_PROVIDER=ollama" > .env.provider
cat .env.provider .env.example > .env.tmp && mv .env.tmp .env

# Switch to OpenAI
echo "EMBEDDING_PROVIDER=openai" > .env.provider
cat .env.provider .env.example > .env.tmp && mv .env.tmp .env

# Test the switch
python test_server.py --provider-only

🎯 Best Practices

Always keep Ollama running for consistent performance
Use systemd/LaunchDaemon for production deployments
Monitor disk space - models can accumulate over time
Test after system updates - ensure compatibility
Backup model configurations - document which models work best

You're now ready to use Ollama with the Ultimate Memory MCP Server! 🎉

Run python memory_mcp_server.py to start your self-hosted, privacy-focused memory system.

6.0 KiB Raw Permalink Blame History