mcp-ultimate-memory/OLLAMA_SETUP.md
Ryan Malloy d1bb9cbf56 🧠 Initial commit: Ultimate Memory MCP Server with Multi-Provider Support
🚀 Features:
- FastMCP 2.8.1+ integration with modern Python 3.11+ features
- Kuzu graph database for intelligent memory relationships
- Multi-provider embedding support (OpenAI, Ollama, Sentence Transformers)
- Automatic relationship detection via semantic similarity
- Graph traversal for connected memory discovery
- 8 MCP tools for comprehensive memory operations

🦙 Self-Hosted Focus:
- Ollama provider for complete privacy and control
- Zero external dependencies for sacred trust applications
- Production-ready with comprehensive testing
- Interactive setup script with provider selection

📦 Complete Package:
- memory_mcp_server.py (1,010 lines) - Main FastMCP server
- Comprehensive test suite and examples
- Detailed documentation including Ollama setup guide
- MCP client configuration examples
- Interactive setup script

🎯 Perfect for LLM memory systems requiring:
- Privacy-first architecture
- Intelligent relationship modeling
- Graph-based memory exploration
- Self-hosted deployment capabilities
2025-06-23 22:34:12 -06:00

281 lines
6.0 KiB
Markdown

# Ollama Setup Guide for Ultimate Memory MCP Server
This guide will help you set up Ollama as your embedding provider for completely self-hosted, private memory operations.
## 🦙 Why Ollama?
- **100% Free** - No API costs or usage limits
- **Privacy First** - All processing happens locally
- **High Quality** - nomic-embed-text performs excellently
- **Self-Contained** - No external dependencies once set up
## 📋 Quick Setup Checklist
### 1. Install Ollama
```bash
# Linux/macOS
curl -fsSL https://ollama.ai/install.sh | sh
# Or download from https://ollama.ai/download
```
### 2. Start Ollama Server
```bash
ollama serve
# Keep this running in a terminal or run as a service
```
### 3. Pull Required Models
```bash
# Essential: Embedding model
ollama pull nomic-embed-text
# Optional: Small chat model for summaries
ollama pull llama3.2:1b
# Check installed models
ollama list
```
### 4. Configure Memory Server
```bash
# In your .env file:
EMBEDDING_PROVIDER=ollama
OLLAMA_BASE_URL=http://localhost:11434
OLLAMA_EMBEDDING_MODEL=nomic-embed-text
```
### 5. Test Setup
```bash
python test_server.py --ollama-setup
```
## 🔧 Advanced Configuration
### Custom Ollama Host
```env
# Remote Ollama server
OLLAMA_BASE_URL=http://192.168.1.100:11434
# Different port
OLLAMA_BASE_URL=http://localhost:8080
```
### Alternative Embedding Models
```bash
# Try different embedding models
ollama pull mxbai-embed-large
ollama pull all-minilm
```
```env
# Update .env to use different model
OLLAMA_EMBEDDING_MODEL=mxbai-embed-large
```
### Model Performance Comparison
| Model | Size | Quality | Speed | Memory |
|-------|------|---------|--------|---------|
| nomic-embed-text | 274MB | ⭐⭐⭐⭐ | ⭐⭐⭐⭐ | 1.5GB |
| mxbai-embed-large | 669MB | ⭐⭐⭐⭐⭐ | ⭐⭐⭐ | 2.5GB |
| all-minilm | 23MB | ⭐⭐⭐ | ⭐⭐⭐⭐⭐ | 512MB |
## 🚀 Running as a Service
### Linux (systemd)
Create `/etc/systemd/system/ollama.service`:
```ini
[Unit]
Description=Ollama Server
After=network-online.target
[Service]
ExecStart=/usr/local/bin/ollama serve
User=ollama
Group=ollama
Restart=always
RestartSec=3
Environment="PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin"
Environment="OLLAMA_HOST=0.0.0.0"
[Install]
WantedBy=default.target
```
```bash
sudo systemctl daemon-reload
sudo systemctl enable ollama
sudo systemctl start ollama
```
### macOS (LaunchDaemon)
Create `~/Library/LaunchAgents/com.ollama.server.plist`:
```xml
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN" "http://www.apple.com/DTDs/PropertyList-1.0.dtd">
<plist version="1.0">
<dict>
<key>Label</key>
<string>com.ollama.server</string>
<key>ProgramArguments</key>
<array>
<string>/usr/local/bin/ollama</string>
<string>serve</string>
</array>
<key>RunAtLoad</key>
<true/>
<key>KeepAlive</key>
<true/>
</dict>
</plist>
```
```bash
launchctl load ~/Library/LaunchAgents/com.ollama.server.plist
```
## 🧪 Testing & Verification
### Test Ollama Connection
```bash
# Check server status
curl http://localhost:11434/api/tags
# Test embedding generation
curl http://localhost:11434/api/embeddings \
-d '{"model": "nomic-embed-text", "prompt": "test"}'
```
### Test with Memory Server
```bash
# Test Ollama-specific functionality
python test_server.py --ollama-setup
# Test full memory operations
EMBEDDING_PROVIDER=ollama python test_server.py
```
### Performance Benchmarks
```bash
# Time embedding generation
time curl -s http://localhost:11434/api/embeddings \
-d '{"model": "nomic-embed-text", "prompt": "performance test"}' \
> /dev/null
```
## 🔧 Troubleshooting
### Common Issues
1. **"Connection refused"**
```bash
# Check if Ollama is running
ps aux | grep ollama
# Start if not running
ollama serve
```
2. **"Model not found"**
```bash
# List available models
ollama list
# Pull missing model
ollama pull nomic-embed-text
```
3. **Slow performance**
```bash
# Check system resources
htop
# Consider smaller model
ollama pull all-minilm
```
4. **Out of memory**
```bash
# Use smaller model
ollama pull all-minilm
# Or increase swap space
sudo swapon --show
```
### Performance Optimization
1. **Hardware Requirements**
- **Minimum**: 4GB RAM, 2 CPU cores
- **Recommended**: 8GB RAM, 4 CPU cores
- **Storage**: 2GB for models
2. **Model Selection**
- **Development**: all-minilm (fast, small)
- **Production**: nomic-embed-text (balanced)
- **High Quality**: mxbai-embed-large (slow, accurate)
3. **Concurrent Requests**
```env
# Ollama handles concurrency automatically
# No additional configuration needed
```
## 📊 Monitoring
### Check Ollama Logs
```bash
# If running as service
journalctl -u ollama -f
# If running manually
# Logs appear in the terminal where you ran 'ollama serve'
```
### Monitor Resource Usage
```bash
# CPU and memory usage
htop
# Disk usage for models
du -sh ~/.ollama/models/
```
### API Health Check
```bash
# Simple health check
curl -f http://localhost:11434/api/tags && echo "✅ Ollama OK" || echo "❌ Ollama Error"
```
## 🔄 Switching Between Providers
You can easily switch between providers by changing your `.env` file:
```bash
# Switch to Ollama
echo "EMBEDDING_PROVIDER=ollama" > .env.provider
cat .env.provider .env.example > .env.tmp && mv .env.tmp .env
# Switch to OpenAI
echo "EMBEDDING_PROVIDER=openai" > .env.provider
cat .env.provider .env.example > .env.tmp && mv .env.tmp .env
# Test the switch
python test_server.py --provider-only
```
## 🎯 Best Practices
1. **Always keep Ollama running** for consistent performance
2. **Use systemd/LaunchDaemon** for production deployments
3. **Monitor disk space** - models can accumulate over time
4. **Test after system updates** - ensure compatibility
5. **Backup model configurations** - document which models work best
---
**You're now ready to use Ollama with the Ultimate Memory MCP Server!** 🎉
Run `python memory_mcp_server.py` to start your self-hosted, privacy-focused memory system.