🚀 Features: - FastMCP 2.8.1+ integration with modern Python 3.11+ features - Kuzu graph database for intelligent memory relationships - Multi-provider embedding support (OpenAI, Ollama, Sentence Transformers) - Automatic relationship detection via semantic similarity - Graph traversal for connected memory discovery - 8 MCP tools for comprehensive memory operations 🦙 Self-Hosted Focus: - Ollama provider for complete privacy and control - Zero external dependencies for sacred trust applications - Production-ready with comprehensive testing - Interactive setup script with provider selection 📦 Complete Package: - memory_mcp_server.py (1,010 lines) - Main FastMCP server - Comprehensive test suite and examples - Detailed documentation including Ollama setup guide - MCP client configuration examples - Interactive setup script 🎯 Perfect for LLM memory systems requiring: - Privacy-first architecture - Intelligent relationship modeling - Graph-based memory exploration - Self-hosted deployment capabilities
281 lines
6.0 KiB
Markdown
281 lines
6.0 KiB
Markdown
# Ollama Setup Guide for Ultimate Memory MCP Server
|
|
|
|
This guide will help you set up Ollama as your embedding provider for completely self-hosted, private memory operations.
|
|
|
|
## 🦙 Why Ollama?
|
|
|
|
- **100% Free** - No API costs or usage limits
|
|
- **Privacy First** - All processing happens locally
|
|
- **High Quality** - nomic-embed-text performs excellently
|
|
- **Self-Contained** - No external dependencies once set up
|
|
|
|
## 📋 Quick Setup Checklist
|
|
|
|
### 1. Install Ollama
|
|
```bash
|
|
# Linux/macOS
|
|
curl -fsSL https://ollama.ai/install.sh | sh
|
|
|
|
# Or download from https://ollama.ai/download
|
|
```
|
|
|
|
### 2. Start Ollama Server
|
|
```bash
|
|
ollama serve
|
|
# Keep this running in a terminal or run as a service
|
|
```
|
|
|
|
### 3. Pull Required Models
|
|
```bash
|
|
# Essential: Embedding model
|
|
ollama pull nomic-embed-text
|
|
|
|
# Optional: Small chat model for summaries
|
|
ollama pull llama3.2:1b
|
|
|
|
# Check installed models
|
|
ollama list
|
|
```
|
|
|
|
### 4. Configure Memory Server
|
|
```bash
|
|
# In your .env file:
|
|
EMBEDDING_PROVIDER=ollama
|
|
OLLAMA_BASE_URL=http://localhost:11434
|
|
OLLAMA_EMBEDDING_MODEL=nomic-embed-text
|
|
```
|
|
|
|
### 5. Test Setup
|
|
```bash
|
|
python test_server.py --ollama-setup
|
|
```
|
|
|
|
## 🔧 Advanced Configuration
|
|
|
|
### Custom Ollama Host
|
|
```env
|
|
# Remote Ollama server
|
|
OLLAMA_BASE_URL=http://192.168.1.100:11434
|
|
|
|
# Different port
|
|
OLLAMA_BASE_URL=http://localhost:8080
|
|
```
|
|
|
|
### Alternative Embedding Models
|
|
```bash
|
|
# Try different embedding models
|
|
ollama pull mxbai-embed-large
|
|
ollama pull all-minilm
|
|
```
|
|
|
|
```env
|
|
# Update .env to use different model
|
|
OLLAMA_EMBEDDING_MODEL=mxbai-embed-large
|
|
```
|
|
|
|
### Model Performance Comparison
|
|
|
|
| Model | Size | Quality | Speed | Memory |
|
|
|-------|------|---------|--------|---------|
|
|
| nomic-embed-text | 274MB | ⭐⭐⭐⭐ | ⭐⭐⭐⭐ | 1.5GB |
|
|
| mxbai-embed-large | 669MB | ⭐⭐⭐⭐⭐ | ⭐⭐⭐ | 2.5GB |
|
|
| all-minilm | 23MB | ⭐⭐⭐ | ⭐⭐⭐⭐⭐ | 512MB |
|
|
|
|
## 🚀 Running as a Service
|
|
|
|
### Linux (systemd)
|
|
Create `/etc/systemd/system/ollama.service`:
|
|
```ini
|
|
[Unit]
|
|
Description=Ollama Server
|
|
After=network-online.target
|
|
|
|
[Service]
|
|
ExecStart=/usr/local/bin/ollama serve
|
|
User=ollama
|
|
Group=ollama
|
|
Restart=always
|
|
RestartSec=3
|
|
Environment="PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin"
|
|
Environment="OLLAMA_HOST=0.0.0.0"
|
|
|
|
[Install]
|
|
WantedBy=default.target
|
|
```
|
|
|
|
```bash
|
|
sudo systemctl daemon-reload
|
|
sudo systemctl enable ollama
|
|
sudo systemctl start ollama
|
|
```
|
|
|
|
### macOS (LaunchDaemon)
|
|
Create `~/Library/LaunchAgents/com.ollama.server.plist`:
|
|
```xml
|
|
<?xml version="1.0" encoding="UTF-8"?>
|
|
<!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN" "http://www.apple.com/DTDs/PropertyList-1.0.dtd">
|
|
<plist version="1.0">
|
|
<dict>
|
|
<key>Label</key>
|
|
<string>com.ollama.server</string>
|
|
<key>ProgramArguments</key>
|
|
<array>
|
|
<string>/usr/local/bin/ollama</string>
|
|
<string>serve</string>
|
|
</array>
|
|
<key>RunAtLoad</key>
|
|
<true/>
|
|
<key>KeepAlive</key>
|
|
<true/>
|
|
</dict>
|
|
</plist>
|
|
```
|
|
|
|
```bash
|
|
launchctl load ~/Library/LaunchAgents/com.ollama.server.plist
|
|
```
|
|
|
|
## 🧪 Testing & Verification
|
|
|
|
### Test Ollama Connection
|
|
```bash
|
|
# Check server status
|
|
curl http://localhost:11434/api/tags
|
|
|
|
# Test embedding generation
|
|
curl http://localhost:11434/api/embeddings \
|
|
-d '{"model": "nomic-embed-text", "prompt": "test"}'
|
|
```
|
|
|
|
### Test with Memory Server
|
|
```bash
|
|
# Test Ollama-specific functionality
|
|
python test_server.py --ollama-setup
|
|
|
|
# Test full memory operations
|
|
EMBEDDING_PROVIDER=ollama python test_server.py
|
|
```
|
|
|
|
### Performance Benchmarks
|
|
```bash
|
|
# Time embedding generation
|
|
time curl -s http://localhost:11434/api/embeddings \
|
|
-d '{"model": "nomic-embed-text", "prompt": "performance test"}' \
|
|
> /dev/null
|
|
```
|
|
|
|
## 🔧 Troubleshooting
|
|
|
|
### Common Issues
|
|
|
|
1. **"Connection refused"**
|
|
```bash
|
|
# Check if Ollama is running
|
|
ps aux | grep ollama
|
|
|
|
# Start if not running
|
|
ollama serve
|
|
```
|
|
|
|
2. **"Model not found"**
|
|
```bash
|
|
# List available models
|
|
ollama list
|
|
|
|
# Pull missing model
|
|
ollama pull nomic-embed-text
|
|
```
|
|
|
|
3. **Slow performance**
|
|
```bash
|
|
# Check system resources
|
|
htop
|
|
|
|
# Consider smaller model
|
|
ollama pull all-minilm
|
|
```
|
|
|
|
4. **Out of memory**
|
|
```bash
|
|
# Use smaller model
|
|
ollama pull all-minilm
|
|
|
|
# Or increase swap space
|
|
sudo swapon --show
|
|
```
|
|
|
|
### Performance Optimization
|
|
|
|
1. **Hardware Requirements**
|
|
- **Minimum**: 4GB RAM, 2 CPU cores
|
|
- **Recommended**: 8GB RAM, 4 CPU cores
|
|
- **Storage**: 2GB for models
|
|
|
|
2. **Model Selection**
|
|
- **Development**: all-minilm (fast, small)
|
|
- **Production**: nomic-embed-text (balanced)
|
|
- **High Quality**: mxbai-embed-large (slow, accurate)
|
|
|
|
3. **Concurrent Requests**
|
|
```env
|
|
# Ollama handles concurrency automatically
|
|
# No additional configuration needed
|
|
```
|
|
|
|
## 📊 Monitoring
|
|
|
|
### Check Ollama Logs
|
|
```bash
|
|
# If running as service
|
|
journalctl -u ollama -f
|
|
|
|
# If running manually
|
|
# Logs appear in the terminal where you ran 'ollama serve'
|
|
```
|
|
|
|
### Monitor Resource Usage
|
|
```bash
|
|
# CPU and memory usage
|
|
htop
|
|
|
|
# Disk usage for models
|
|
du -sh ~/.ollama/models/
|
|
```
|
|
|
|
### API Health Check
|
|
```bash
|
|
# Simple health check
|
|
curl -f http://localhost:11434/api/tags && echo "✅ Ollama OK" || echo "❌ Ollama Error"
|
|
```
|
|
|
|
## 🔄 Switching Between Providers
|
|
|
|
You can easily switch between providers by changing your `.env` file:
|
|
|
|
```bash
|
|
# Switch to Ollama
|
|
echo "EMBEDDING_PROVIDER=ollama" > .env.provider
|
|
cat .env.provider .env.example > .env.tmp && mv .env.tmp .env
|
|
|
|
# Switch to OpenAI
|
|
echo "EMBEDDING_PROVIDER=openai" > .env.provider
|
|
cat .env.provider .env.example > .env.tmp && mv .env.tmp .env
|
|
|
|
# Test the switch
|
|
python test_server.py --provider-only
|
|
```
|
|
|
|
## 🎯 Best Practices
|
|
|
|
1. **Always keep Ollama running** for consistent performance
|
|
2. **Use systemd/LaunchDaemon** for production deployments
|
|
3. **Monitor disk space** - models can accumulate over time
|
|
4. **Test after system updates** - ensure compatibility
|
|
5. **Backup model configurations** - document which models work best
|
|
|
|
---
|
|
|
|
**You're now ready to use Ollama with the Ultimate Memory MCP Server!** 🎉
|
|
|
|
Run `python memory_mcp_server.py` to start your self-hosted, privacy-focused memory system.
|