mcp-ultimate-memory/OLLAMA_SETUP.md

# Ollama Setup Guide for Ultimate Memory MCP Server

This guide will help you set up Ollama as your embedding provider for completely self-hosted, private memory operations.

## 🦙 Why Ollama?

- **100% Free** - No API costs or usage limits
- **Privacy First** - All processing happens locally
- **High Quality** - nomic-embed-text performs excellently
- **Self-Contained** - No external dependencies once set up

## 📋 Quick Setup Checklist

### 1. Install Ollama
```bash
# Linux/macOS
curl -fsSL https://ollama.ai/install.sh | sh

# Or download from https://ollama.ai/download
```

### 2. Start Ollama Server
```bash
ollama serve
# Keep this running in a terminal or run as a service
```

### 3. Pull Required Models
```bash
# Essential: Embedding model
ollama pull nomic-embed-text

# Optional: Small chat model for summaries
ollama pull llama3.2:1b

# Check installed models
ollama list
```

### 4. Configure Memory Server
```bash
# In your .env file:
EMBEDDING_PROVIDER=ollama
OLLAMA_BASE_URL=http://localhost:11434
OLLAMA_EMBEDDING_MODEL=nomic-embed-text
```

### 5. Test Setup
```bash
python test_server.py --ollama-setup
```

## 🔧 Advanced Configuration

### Custom Ollama Host
```env
# Remote Ollama server
OLLAMA_BASE_URL=http://192.168.1.100:11434

# Different port
OLLAMA_BASE_URL=http://localhost:8080
```

### Alternative Embedding Models
```bash
# Try different embedding models
ollama pull mxbai-embed-large
ollama pull all-minilm
```

```env
# Update .env to use different model
OLLAMA_EMBEDDING_MODEL=mxbai-embed-large
```

### Model Performance Comparison

| Model | Size | Quality | Speed | Memory |
|-------|------|---------|--------|---------|
| nomic-embed-text | 274MB | ⭐⭐⭐⭐ | ⭐⭐⭐⭐ | 1.5GB |
| mxbai-embed-large | 669MB | ⭐⭐⭐⭐⭐ | ⭐⭐⭐ | 2.5GB |
| all-minilm | 23MB | ⭐⭐⭐ | ⭐⭐⭐⭐⭐ | 512MB |

## 🚀 Running as a Service

### Linux (systemd)
Create `/etc/systemd/system/ollama.service`:
```ini
[Unit]
Description=Ollama Server
After=network-online.target

[Service]
ExecStart=/usr/local/bin/ollama serve
User=ollama
Group=ollama
Restart=always
RestartSec=3
Environment="PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin"
Environment="OLLAMA_HOST=0.0.0.0"

[Install]
WantedBy=default.target
```

```bash
sudo systemctl daemon-reload
sudo systemctl enable ollama
sudo systemctl start ollama
```

### macOS (LaunchDaemon)
Create `~/Library/LaunchAgents/com.ollama.server.plist`:
```xml
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN" "http://www.apple.com/DTDs/PropertyList-1.0.dtd">
<plist version="1.0">
<dict>
    <key>Label</key>
    <string>com.ollama.server</string>
    <key>ProgramArguments</key>
    <array>
        <string>/usr/local/bin/ollama</string>
        <string>serve</string>
    </array>
    <key>RunAtLoad</key>
    <true/>
    <key>KeepAlive</key>
    <true/>
</dict>
</plist>
```

```bash
launchctl load ~/Library/LaunchAgents/com.ollama.server.plist
```

## 🧪 Testing & Verification

### Test Ollama Connection
```bash
# Check server status
curl http://localhost:11434/api/tags

# Test embedding generation
curl http://localhost:11434/api/embeddings \
  -d '{"model": "nomic-embed-text", "prompt": "test"}'
```

### Test with Memory Server
```bash
# Test Ollama-specific functionality
python test_server.py --ollama-setup

# Test full memory operations
EMBEDDING_PROVIDER=ollama python test_server.py
```

### Performance Benchmarks
```bash
# Time embedding generation
time curl -s http://localhost:11434/api/embeddings \
  -d '{"model": "nomic-embed-text", "prompt": "performance test"}' \
  > /dev/null
```

## 🔧 Troubleshooting

### Common Issues

1. **"Connection refused"**
   ```bash
   # Check if Ollama is running
   ps aux | grep ollama

   # Start if not running
   ollama serve
   ```

2. **"Model not found"**
   ```bash
   # List available models
   ollama list

   # Pull missing model
   ollama pull nomic-embed-text
   ```

3. **Slow performance**
   ```bash
   # Check system resources
   htop

   # Consider smaller model
   ollama pull all-minilm
   ```

4. **Out of memory**
   ```bash
   # Use smaller model
   ollama pull all-minilm

   # Or increase swap space
   sudo swapon --show
   ```

### Performance Optimization

1. **Hardware Requirements**
   - **Minimum**: 4GB RAM, 2 CPU cores
   - **Recommended**: 8GB RAM, 4 CPU cores
   - **Storage**: 2GB for models

2. **Model Selection**
   - **Development**: all-minilm (fast, small)
   - **Production**: nomic-embed-text (balanced)
   - **High Quality**: mxbai-embed-large (slow, accurate)

3. **Concurrent Requests**
   ```env
   # Ollama handles concurrency automatically
   # No additional configuration needed
   ```

## 📊 Monitoring

### Check Ollama Logs
```bash
# If running as service
journalctl -u ollama -f

# If running manually
# Logs appear in the terminal where you ran 'ollama serve'
```

### Monitor Resource Usage
```bash
# CPU and memory usage
htop

# Disk usage for models
du -sh ~/.ollama/models/
```

### API Health Check
```bash
# Simple health check
curl -f http://localhost:11434/api/tags && echo "✅ Ollama OK" || echo "❌ Ollama Error"
```

## 🔄 Switching Between Providers

You can easily switch between providers by changing your `.env` file:

```bash
# Switch to Ollama
echo "EMBEDDING_PROVIDER=ollama" > .env.provider
cat .env.provider .env.example > .env.tmp && mv .env.tmp .env

# Switch to OpenAI
echo "EMBEDDING_PROVIDER=openai" > .env.provider
cat .env.provider .env.example > .env.tmp && mv .env.tmp .env

# Test the switch
python test_server.py --provider-only
```

## 🎯 Best Practices

1. **Always keep Ollama running** for consistent performance
2. **Use systemd/LaunchDaemon** for production deployments
3. **Monitor disk space** - models can accumulate over time
4. **Test after system updates** - ensure compatibility
5. **Backup model configurations** - document which models work best

---

**You're now ready to use Ollama with the Ultimate Memory MCP Server!** 🎉

Run `python memory_mcp_server.py` to start your self-hosted, privacy-focused memory system.