🚀 Features: - FastMCP 2.8.1+ integration with modern Python 3.11+ features - Kuzu graph database for intelligent memory relationships - Multi-provider embedding support (OpenAI, Ollama, Sentence Transformers) - Automatic relationship detection via semantic similarity - Graph traversal for connected memory discovery - 8 MCP tools for comprehensive memory operations 🦙 Self-Hosted Focus: - Ollama provider for complete privacy and control - Zero external dependencies for sacred trust applications - Production-ready with comprehensive testing - Interactive setup script with provider selection 📦 Complete Package: - memory_mcp_server.py (1,010 lines) - Main FastMCP server - Comprehensive test suite and examples - Detailed documentation including Ollama setup guide - MCP client configuration examples - Interactive setup script 🎯 Perfect for LLM memory systems requiring: - Privacy-first architecture - Intelligent relationship modeling - Graph-based memory exploration - Self-hosted deployment capabilities
6.0 KiB
6.0 KiB
Ollama Setup Guide for Ultimate Memory MCP Server
This guide will help you set up Ollama as your embedding provider for completely self-hosted, private memory operations.
🦙 Why Ollama?
- 100% Free - No API costs or usage limits
- Privacy First - All processing happens locally
- High Quality - nomic-embed-text performs excellently
- Self-Contained - No external dependencies once set up
📋 Quick Setup Checklist
1. Install Ollama
# Linux/macOS
curl -fsSL https://ollama.ai/install.sh | sh
# Or download from https://ollama.ai/download
2. Start Ollama Server
ollama serve
# Keep this running in a terminal or run as a service
3. Pull Required Models
# Essential: Embedding model
ollama pull nomic-embed-text
# Optional: Small chat model for summaries
ollama pull llama3.2:1b
# Check installed models
ollama list
4. Configure Memory Server
# In your .env file:
EMBEDDING_PROVIDER=ollama
OLLAMA_BASE_URL=http://localhost:11434
OLLAMA_EMBEDDING_MODEL=nomic-embed-text
5. Test Setup
python test_server.py --ollama-setup
🔧 Advanced Configuration
Custom Ollama Host
# Remote Ollama server
OLLAMA_BASE_URL=http://192.168.1.100:11434
# Different port
OLLAMA_BASE_URL=http://localhost:8080
Alternative Embedding Models
# Try different embedding models
ollama pull mxbai-embed-large
ollama pull all-minilm
# Update .env to use different model
OLLAMA_EMBEDDING_MODEL=mxbai-embed-large
Model Performance Comparison
Model | Size | Quality | Speed | Memory |
---|---|---|---|---|
nomic-embed-text | 274MB | ⭐⭐⭐⭐ | ⭐⭐⭐⭐ | 1.5GB |
mxbai-embed-large | 669MB | ⭐⭐⭐⭐⭐ | ⭐⭐⭐ | 2.5GB |
all-minilm | 23MB | ⭐⭐⭐ | ⭐⭐⭐⭐⭐ | 512MB |
🚀 Running as a Service
Linux (systemd)
Create /etc/systemd/system/ollama.service
:
[Unit]
Description=Ollama Server
After=network-online.target
[Service]
ExecStart=/usr/local/bin/ollama serve
User=ollama
Group=ollama
Restart=always
RestartSec=3
Environment="PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin"
Environment="OLLAMA_HOST=0.0.0.0"
[Install]
WantedBy=default.target
sudo systemctl daemon-reload
sudo systemctl enable ollama
sudo systemctl start ollama
macOS (LaunchDaemon)
Create ~/Library/LaunchAgents/com.ollama.server.plist
:
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN" "http://www.apple.com/DTDs/PropertyList-1.0.dtd">
<plist version="1.0">
<dict>
<key>Label</key>
<string>com.ollama.server</string>
<key>ProgramArguments</key>
<array>
<string>/usr/local/bin/ollama</string>
<string>serve</string>
</array>
<key>RunAtLoad</key>
<true/>
<key>KeepAlive</key>
<true/>
</dict>
</plist>
launchctl load ~/Library/LaunchAgents/com.ollama.server.plist
🧪 Testing & Verification
Test Ollama Connection
# Check server status
curl http://localhost:11434/api/tags
# Test embedding generation
curl http://localhost:11434/api/embeddings \
-d '{"model": "nomic-embed-text", "prompt": "test"}'
Test with Memory Server
# Test Ollama-specific functionality
python test_server.py --ollama-setup
# Test full memory operations
EMBEDDING_PROVIDER=ollama python test_server.py
Performance Benchmarks
# Time embedding generation
time curl -s http://localhost:11434/api/embeddings \
-d '{"model": "nomic-embed-text", "prompt": "performance test"}' \
> /dev/null
🔧 Troubleshooting
Common Issues
-
"Connection refused"
# Check if Ollama is running ps aux | grep ollama # Start if not running ollama serve
-
"Model not found"
# List available models ollama list # Pull missing model ollama pull nomic-embed-text
-
Slow performance
# Check system resources htop # Consider smaller model ollama pull all-minilm
-
Out of memory
# Use smaller model ollama pull all-minilm # Or increase swap space sudo swapon --show
Performance Optimization
-
Hardware Requirements
- Minimum: 4GB RAM, 2 CPU cores
- Recommended: 8GB RAM, 4 CPU cores
- Storage: 2GB for models
-
Model Selection
- Development: all-minilm (fast, small)
- Production: nomic-embed-text (balanced)
- High Quality: mxbai-embed-large (slow, accurate)
-
Concurrent Requests
# Ollama handles concurrency automatically # No additional configuration needed
📊 Monitoring
Check Ollama Logs
# If running as service
journalctl -u ollama -f
# If running manually
# Logs appear in the terminal where you ran 'ollama serve'
Monitor Resource Usage
# CPU and memory usage
htop
# Disk usage for models
du -sh ~/.ollama/models/
API Health Check
# Simple health check
curl -f http://localhost:11434/api/tags && echo "✅ Ollama OK" || echo "❌ Ollama Error"
🔄 Switching Between Providers
You can easily switch between providers by changing your .env
file:
# Switch to Ollama
echo "EMBEDDING_PROVIDER=ollama" > .env.provider
cat .env.provider .env.example > .env.tmp && mv .env.tmp .env
# Switch to OpenAI
echo "EMBEDDING_PROVIDER=openai" > .env.provider
cat .env.provider .env.example > .env.tmp && mv .env.tmp .env
# Test the switch
python test_server.py --provider-only
🎯 Best Practices
- Always keep Ollama running for consistent performance
- Use systemd/LaunchDaemon for production deployments
- Monitor disk space - models can accumulate over time
- Test after system updates - ensure compatibility
- Backup model configurations - document which models work best
You're now ready to use Ollama with the Ultimate Memory MCP Server! 🎉
Run python memory_mcp_server.py
to start your self-hosted, privacy-focused memory system.