# Ollama Setup Guide for Ultimate Memory MCP Server This guide will help you set up Ollama as your embedding provider for completely self-hosted, private memory operations. ## 🦙 Why Ollama? - **100% Free** - No API costs or usage limits - **Privacy First** - All processing happens locally - **High Quality** - nomic-embed-text performs excellently - **Self-Contained** - No external dependencies once set up ## 📋 Quick Setup Checklist ### 1. Install Ollama ```bash # Linux/macOS curl -fsSL https://ollama.ai/install.sh | sh # Or download from https://ollama.ai/download ``` ### 2. Start Ollama Server ```bash ollama serve # Keep this running in a terminal or run as a service ``` ### 3. Pull Required Models ```bash # Essential: Embedding model ollama pull nomic-embed-text # Optional: Small chat model for summaries ollama pull llama3.2:1b # Check installed models ollama list ``` ### 4. Configure Memory Server ```bash # In your .env file: EMBEDDING_PROVIDER=ollama OLLAMA_BASE_URL=http://localhost:11434 OLLAMA_EMBEDDING_MODEL=nomic-embed-text ``` ### 5. Test Setup ```bash python test_server.py --ollama-setup ``` ## 🔧 Advanced Configuration ### Custom Ollama Host ```env # Remote Ollama server OLLAMA_BASE_URL=http://192.168.1.100:11434 # Different port OLLAMA_BASE_URL=http://localhost:8080 ``` ### Alternative Embedding Models ```bash # Try different embedding models ollama pull mxbai-embed-large ollama pull all-minilm ``` ```env # Update .env to use different model OLLAMA_EMBEDDING_MODEL=mxbai-embed-large ``` ### Model Performance Comparison | Model | Size | Quality | Speed | Memory | |-------|------|---------|--------|---------| | nomic-embed-text | 274MB | ⭐⭐⭐⭐ | ⭐⭐⭐⭐ | 1.5GB | | mxbai-embed-large | 669MB | ⭐⭐⭐⭐⭐ | ⭐⭐⭐ | 2.5GB | | all-minilm | 23MB | ⭐⭐⭐ | ⭐⭐⭐⭐⭐ | 512MB | ## 🚀 Running as a Service ### Linux (systemd) Create `/etc/systemd/system/ollama.service`: ```ini [Unit] Description=Ollama Server After=network-online.target [Service] ExecStart=/usr/local/bin/ollama serve User=ollama Group=ollama Restart=always RestartSec=3 Environment="PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin" Environment="OLLAMA_HOST=0.0.0.0" [Install] WantedBy=default.target ``` ```bash sudo systemctl daemon-reload sudo systemctl enable ollama sudo systemctl start ollama ``` ### macOS (LaunchDaemon) Create `~/Library/LaunchAgents/com.ollama.server.plist`: ```xml Label com.ollama.server ProgramArguments /usr/local/bin/ollama serve RunAtLoad KeepAlive ``` ```bash launchctl load ~/Library/LaunchAgents/com.ollama.server.plist ``` ## 🧪 Testing & Verification ### Test Ollama Connection ```bash # Check server status curl http://localhost:11434/api/tags # Test embedding generation curl http://localhost:11434/api/embeddings \ -d '{"model": "nomic-embed-text", "prompt": "test"}' ``` ### Test with Memory Server ```bash # Test Ollama-specific functionality python test_server.py --ollama-setup # Test full memory operations EMBEDDING_PROVIDER=ollama python test_server.py ``` ### Performance Benchmarks ```bash # Time embedding generation time curl -s http://localhost:11434/api/embeddings \ -d '{"model": "nomic-embed-text", "prompt": "performance test"}' \ > /dev/null ``` ## 🔧 Troubleshooting ### Common Issues 1. **"Connection refused"** ```bash # Check if Ollama is running ps aux | grep ollama # Start if not running ollama serve ``` 2. **"Model not found"** ```bash # List available models ollama list # Pull missing model ollama pull nomic-embed-text ``` 3. **Slow performance** ```bash # Check system resources htop # Consider smaller model ollama pull all-minilm ``` 4. **Out of memory** ```bash # Use smaller model ollama pull all-minilm # Or increase swap space sudo swapon --show ``` ### Performance Optimization 1. **Hardware Requirements** - **Minimum**: 4GB RAM, 2 CPU cores - **Recommended**: 8GB RAM, 4 CPU cores - **Storage**: 2GB for models 2. **Model Selection** - **Development**: all-minilm (fast, small) - **Production**: nomic-embed-text (balanced) - **High Quality**: mxbai-embed-large (slow, accurate) 3. **Concurrent Requests** ```env # Ollama handles concurrency automatically # No additional configuration needed ``` ## 📊 Monitoring ### Check Ollama Logs ```bash # If running as service journalctl -u ollama -f # If running manually # Logs appear in the terminal where you ran 'ollama serve' ``` ### Monitor Resource Usage ```bash # CPU and memory usage htop # Disk usage for models du -sh ~/.ollama/models/ ``` ### API Health Check ```bash # Simple health check curl -f http://localhost:11434/api/tags && echo "✅ Ollama OK" || echo "❌ Ollama Error" ``` ## 🔄 Switching Between Providers You can easily switch between providers by changing your `.env` file: ```bash # Switch to Ollama echo "EMBEDDING_PROVIDER=ollama" > .env.provider cat .env.provider .env.example > .env.tmp && mv .env.tmp .env # Switch to OpenAI echo "EMBEDDING_PROVIDER=openai" > .env.provider cat .env.provider .env.example > .env.tmp && mv .env.tmp .env # Test the switch python test_server.py --provider-only ``` ## 🎯 Best Practices 1. **Always keep Ollama running** for consistent performance 2. **Use systemd/LaunchDaemon** for production deployments 3. **Monitor disk space** - models can accumulate over time 4. **Test after system updates** - ensure compatibility 5. **Backup model configurations** - document which models work best --- **You're now ready to use Ollama with the Ultimate Memory MCP Server!** 🎉 Run `python memory_mcp_server.py` to start your self-hosted, privacy-focused memory system.