mcp-ultimate-memory/OLLAMA_SETUP.md
Ryan Malloy d1bb9cbf56 🧠 Initial commit: Ultimate Memory MCP Server with Multi-Provider Support
🚀 Features:
- FastMCP 2.8.1+ integration with modern Python 3.11+ features
- Kuzu graph database for intelligent memory relationships
- Multi-provider embedding support (OpenAI, Ollama, Sentence Transformers)
- Automatic relationship detection via semantic similarity
- Graph traversal for connected memory discovery
- 8 MCP tools for comprehensive memory operations

🦙 Self-Hosted Focus:
- Ollama provider for complete privacy and control
- Zero external dependencies for sacred trust applications
- Production-ready with comprehensive testing
- Interactive setup script with provider selection

📦 Complete Package:
- memory_mcp_server.py (1,010 lines) - Main FastMCP server
- Comprehensive test suite and examples
- Detailed documentation including Ollama setup guide
- MCP client configuration examples
- Interactive setup script

🎯 Perfect for LLM memory systems requiring:
- Privacy-first architecture
- Intelligent relationship modeling
- Graph-based memory exploration
- Self-hosted deployment capabilities
2025-06-23 22:34:12 -06:00

6.0 KiB

Ollama Setup Guide for Ultimate Memory MCP Server

This guide will help you set up Ollama as your embedding provider for completely self-hosted, private memory operations.

🦙 Why Ollama?

  • 100% Free - No API costs or usage limits
  • Privacy First - All processing happens locally
  • High Quality - nomic-embed-text performs excellently
  • Self-Contained - No external dependencies once set up

📋 Quick Setup Checklist

1. Install Ollama

# Linux/macOS
curl -fsSL https://ollama.ai/install.sh | sh

# Or download from https://ollama.ai/download

2. Start Ollama Server

ollama serve
# Keep this running in a terminal or run as a service

3. Pull Required Models

# Essential: Embedding model
ollama pull nomic-embed-text

# Optional: Small chat model for summaries
ollama pull llama3.2:1b

# Check installed models
ollama list

4. Configure Memory Server

# In your .env file:
EMBEDDING_PROVIDER=ollama
OLLAMA_BASE_URL=http://localhost:11434
OLLAMA_EMBEDDING_MODEL=nomic-embed-text

5. Test Setup

python test_server.py --ollama-setup

🔧 Advanced Configuration

Custom Ollama Host

# Remote Ollama server
OLLAMA_BASE_URL=http://192.168.1.100:11434

# Different port
OLLAMA_BASE_URL=http://localhost:8080

Alternative Embedding Models

# Try different embedding models
ollama pull mxbai-embed-large
ollama pull all-minilm
# Update .env to use different model
OLLAMA_EMBEDDING_MODEL=mxbai-embed-large

Model Performance Comparison

Model Size Quality Speed Memory
nomic-embed-text 274MB 1.5GB
mxbai-embed-large 669MB 2.5GB
all-minilm 23MB 512MB

🚀 Running as a Service

Linux (systemd)

Create /etc/systemd/system/ollama.service:

[Unit]
Description=Ollama Server
After=network-online.target

[Service]
ExecStart=/usr/local/bin/ollama serve
User=ollama
Group=ollama
Restart=always
RestartSec=3
Environment="PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin"
Environment="OLLAMA_HOST=0.0.0.0"

[Install]
WantedBy=default.target
sudo systemctl daemon-reload
sudo systemctl enable ollama
sudo systemctl start ollama

macOS (LaunchDaemon)

Create ~/Library/LaunchAgents/com.ollama.server.plist:

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN" "http://www.apple.com/DTDs/PropertyList-1.0.dtd">
<plist version="1.0">
<dict>
    <key>Label</key>
    <string>com.ollama.server</string>
    <key>ProgramArguments</key>
    <array>
        <string>/usr/local/bin/ollama</string>
        <string>serve</string>
    </array>
    <key>RunAtLoad</key>
    <true/>
    <key>KeepAlive</key>
    <true/>
</dict>
</plist>
launchctl load ~/Library/LaunchAgents/com.ollama.server.plist

🧪 Testing & Verification

Test Ollama Connection

# Check server status
curl http://localhost:11434/api/tags

# Test embedding generation
curl http://localhost:11434/api/embeddings \
  -d '{"model": "nomic-embed-text", "prompt": "test"}'

Test with Memory Server

# Test Ollama-specific functionality
python test_server.py --ollama-setup

# Test full memory operations
EMBEDDING_PROVIDER=ollama python test_server.py

Performance Benchmarks

# Time embedding generation
time curl -s http://localhost:11434/api/embeddings \
  -d '{"model": "nomic-embed-text", "prompt": "performance test"}' \
  > /dev/null

🔧 Troubleshooting

Common Issues

  1. "Connection refused"

    # Check if Ollama is running
    ps aux | grep ollama
    
    # Start if not running
    ollama serve
    
  2. "Model not found"

    # List available models
    ollama list
    
    # Pull missing model
    ollama pull nomic-embed-text
    
  3. Slow performance

    # Check system resources
    htop
    
    # Consider smaller model
    ollama pull all-minilm
    
  4. Out of memory

    # Use smaller model
    ollama pull all-minilm
    
    # Or increase swap space
    sudo swapon --show
    

Performance Optimization

  1. Hardware Requirements

    • Minimum: 4GB RAM, 2 CPU cores
    • Recommended: 8GB RAM, 4 CPU cores
    • Storage: 2GB for models
  2. Model Selection

    • Development: all-minilm (fast, small)
    • Production: nomic-embed-text (balanced)
    • High Quality: mxbai-embed-large (slow, accurate)
  3. Concurrent Requests

    # Ollama handles concurrency automatically
    # No additional configuration needed
    

📊 Monitoring

Check Ollama Logs

# If running as service
journalctl -u ollama -f

# If running manually
# Logs appear in the terminal where you ran 'ollama serve'

Monitor Resource Usage

# CPU and memory usage
htop

# Disk usage for models
du -sh ~/.ollama/models/

API Health Check

# Simple health check
curl -f http://localhost:11434/api/tags && echo "✅ Ollama OK" || echo "❌ Ollama Error"

🔄 Switching Between Providers

You can easily switch between providers by changing your .env file:

# Switch to Ollama
echo "EMBEDDING_PROVIDER=ollama" > .env.provider
cat .env.provider .env.example > .env.tmp && mv .env.tmp .env

# Switch to OpenAI
echo "EMBEDDING_PROVIDER=openai" > .env.provider
cat .env.provider .env.example > .env.tmp && mv .env.tmp .env

# Test the switch
python test_server.py --provider-only

🎯 Best Practices

  1. Always keep Ollama running for consistent performance
  2. Use systemd/LaunchDaemon for production deployments
  3. Monitor disk space - models can accumulate over time
  4. Test after system updates - ensure compatibility
  5. Backup model configurations - document which models work best

You're now ready to use Ollama with the Ultimate Memory MCP Server! 🎉

Run python memory_mcp_server.py to start your self-hosted, privacy-focused memory system.