mcp-ultimate-memory/README.md

# Ultimate Memory MCP Server - Ollama Edition 🦙

A high-performance, **completely self-hosted** memory system for LLMs powered by **Ollama**. Perfect for privacy-focused AI applications with no external dependencies or costs.

Built with **FastMCP 2.8.1+** and **Kuzu Graph Database** for optimal performance.

## 🚀 Features

- **🧠 Graph-Native Memory**: Stores memories as nodes with rich relationship modeling
- **🔍 Multi-Modal Search**: Semantic similarity + keyword matching + graph traversal
- **🕸️ Intelligent Relationships**: Auto-generates connections based on semantic similarity
- **🦙 Ollama-Powered**: Self-hosted embeddings with complete privacy
- **📊 Graph Analytics**: Pattern analysis and centrality detection
- **🎯 Memory Types**: Episodic, semantic, and procedural memory classification
- **🔒 Zero External Deps**: No API keys, no cloud services, no data sharing

## 🦙 Why Ollama?

**Perfect for "Sacred Trust" AI systems:**

- **100% Private** - All processing happens on your hardware
- **Zero Costs** - No API fees, no usage limits
- **Always Available** - No network dependencies or outages
- **Predictable** - You control updates and behavior
- **High Quality** - nomic-embed-text rivals commercial solutions
- **Self-Contained** - Complete system in your control

## Quick Start

### 1. Install Ollama
```bash
# Linux/macOS
curl -fsSL https://ollama.ai/install.sh | sh

# Or download from https://ollama.ai/
```

### 2. Setup Memory Server
```bash
cd /home/rpm/claude/mcp-ultimate-memory

# Automated setup (recommended)
./setup.sh

# Or manual setup:
pip install -r requirements.txt
cp .env.example .env
```

### 3. Start Ollama & Pull Models
```bash
# Start Ollama server (keep running)
ollama serve &

# Pull embedding model
ollama pull nomic-embed-text

# Optional: Pull summary model
ollama pull llama3.2:1b
```

### 4. Test & Run
```bash
# Test everything works
python test_server.py

# Start the memory server
python memory_mcp_server.py
```

## 🛠️ Available MCP Tools

### Core Memory Operations
- **`store_memory`** - Store with automatic relationship detection
- **`search_memories`** - Semantic + keyword search
- **`get_memory`** - Retrieve by ID with access tracking
- **`find_connected_memories`** - Graph traversal
- **`create_relationship`** - Manual relationship creation
- **`get_conversation_memories`** - Conversation context
- **`delete_memory`** - Memory removal
- **`analyze_memory_patterns`** - Graph analytics

### Ollama Management
- **`check_ollama_status`** - Server status and configuration

## 🧠 Memory Types & Examples

### Episodic Memories
Specific events with temporal context.
```python
await store_memory(
    content="User clicked save button at 2:30 PM during demo",
    memory_type="episodic",
    tags=["user-action", "timing", "demo"]
)
```

### Semantic Memories
General facts and preferences.
```python
await store_memory(
    content="User prefers dark mode for reduced eye strain",
    memory_type="semantic",
    tags=["preference", "ui", "health"]
)
```

### Procedural Memories
Step-by-step instructions.
```python
await store_memory(
    content="To enable dark mode: Settings → Appearance → Dark",
    memory_type="procedural",
    tags=["instructions", "ui"]
)
```

## 🔍 Search Examples

### Semantic Search (Recommended)
```python
# Finds memories by meaning, not just keywords
results = await search_memories(
    query="user interface preferences and accessibility",
    search_type="semantic",
    max_results=10
)
```

### Keyword Search
```python
# Fast exact text matching
results = await search_memories(
    query="dark mode",
    search_type="keyword"
)
```

### Graph Traversal
```python
# Find connected memories through relationships
connections = await find_connected_memories(
    memory_id="preference_memory_id",
    max_depth=3,
    min_strength=0.5
)
```

## 🔧 Configuration

### Environment Variables
```env
# Database location
KUZU_DB_PATH=./memory_graph_db

# Ollama server configuration
OLLAMA_BASE_URL=http://localhost:11434
OLLAMA_EMBEDDING_MODEL=nomic-embed-text
```

### MCP Client Configuration
```json
{
  "mcpServers": {
    "memory": {
      "command": "python",
      "args": ["/path/to/memory_mcp_server.py"],
      "env": {
        "KUZU_DB_PATH": "/path/to/memory_graph_db",
        "OLLAMA_BASE_URL": "http://localhost:11434",
        "OLLAMA_EMBEDDING_MODEL": "nomic-embed-text"
      }
    }
  }
}
```

## 📊 Ollama Model Recommendations

### For Sacred Trust / Production Use
```bash
# Primary embedding model (best balance)
ollama pull nomic-embed-text       # 274MB, excellent quality

# Summary model (optional but recommended)
ollama pull llama3.2:1b           # 1.3GB, fast summaries
```

### Alternative Models
```bash
# Faster, smaller (if resources are limited)
ollama pull all-minilm            # 23MB, decent quality

# Higher quality (if you have resources)
ollama pull mxbai-embed-large     # 669MB, best quality
```

### Model Comparison

| Model | Size | Quality | Speed | Memory |
|-------|------|---------|--------|---------|
| nomic-embed-text | 274MB | ⭐⭐⭐⭐ | ⭐⭐⭐⭐ | 1.5GB |
| all-minilm | 23MB | ⭐⭐⭐ | ⭐⭐⭐⭐⭐ | 512MB |
| mxbai-embed-large | 669MB | ⭐⭐⭐⭐⭐ | ⭐⭐⭐ | 2.5GB |

## 🧪 Testing & Verification

### Test Ollama Connection
```bash
python test_server.py --connection-only
```

### Test Full System
```bash
python test_server.py
```

### Check Ollama Status
```bash
# Via test script
python test_server.py --help-setup

# Direct curl
curl http://localhost:11434/api/tags

# List models
ollama list
```

## ⚡ Performance & Resource Usage

### System Requirements
- **Minimum**: 4GB RAM, 2 CPU cores, 2GB storage
- **Recommended**: 8GB RAM, 4 CPU cores, 5GB storage
- **Operating System**: Linux, macOS, Windows

### Performance Characteristics
- **First Request**: ~2-3 seconds (model loading)
- **Subsequent Requests**: ~500-800ms per embedding
- **Memory Usage**: ~1.5GB RAM resident
- **CPU Usage**: ~20% during embedding, ~0% idle

### Optimization Tips
1. **Keep Ollama running** - Avoid model reload overhead
2. **Use SSD storage** - Faster model loading
3. **Batch operations** - Group multiple memories for efficiency
4. **Monitor resources** - `htop` to check RAM/CPU usage

## 🚨 Troubleshooting

### Common Issues

1. **"Connection refused"**
   ```bash
   # Start Ollama server
   ollama serve

   # Check if running
   ps aux | grep ollama
   ```

2. **"Model not found"**
   ```bash
   # List available models
   ollama list

   # Pull required model
   ollama pull nomic-embed-text
   ```

3. **Slow performance**
   ```bash
   # Check system resources
   htop

   # Try smaller model
   ollama pull all-minilm
   ```

4. **Out of memory**
   ```bash
   # Use minimal model
   ollama pull all-minilm

   # Check memory usage
   free -h
   ```

### Debug Commands
```bash
# Test Ollama directly
curl http://localhost:11434/api/tags

# Test embedding generation
curl http://localhost:11434/api/embeddings \
  -d '{"model": "nomic-embed-text", "prompt": "test"}'

# Check server logs
journalctl -u ollama -f  # if running as service
```

## 🔒 Security & Privacy

### Complete Data Privacy
- **No External Calls** - Everything runs locally
- **No Telemetry** - Ollama doesn't phone home
- **Your Hardware** - You control the infrastructure
- **Audit Trail** - Full visibility into operations

### Recommended Security Practices
1. **Firewall Rules** - Block external access to Ollama port
2. **Regular Updates** - Keep Ollama and models updated
3. **Backup Strategy** - Regular backups of memory_graph_db
4. **Access Control** - Limit who can access the server

## 🚀 Production Deployment

### Running as a Service (Linux)
```bash
# Create systemd service for Ollama
sudo tee /etc/systemd/system/ollama.service << EOF
[Unit]
Description=Ollama Server
After=network.target

[Service]
Type=simple
User=ollama
ExecStart=/usr/local/bin/ollama serve
Restart=always
Environment=OLLAMA_HOST=0.0.0.0:11434

[Install]
WantedBy=multi-user.target
EOF

sudo systemctl enable ollama
sudo systemctl start ollama
```

### Memory Server as Service
```bash
# Create service for memory server
sudo tee /etc/systemd/system/memory-server.service << EOF
[Unit]
Description=Memory MCP Server
After=ollama.service
Requires=ollama.service

[Service]
Type=simple
User=memory
WorkingDirectory=/path/to/mcp-ultimate-memory
ExecStart=/usr/bin/python memory_mcp_server.py
Restart=always
Environment=KUZU_DB_PATH=/path/to/memory_graph_db
Environment=OLLAMA_BASE_URL=http://localhost:11434

[Install]
WantedBy=multi-user.target
EOF

sudo systemctl enable memory-server
sudo systemctl start memory-server
```

## 📊 Monitoring

### Health Checks
```bash
# Check Ollama status via MCP tool
echo '{"tool": "check_ollama_status"}' | python -c "
import json, asyncio
from memory_mcp_server import *
# ... health check code
"

# Check memory graph statistics
echo '{"tool": "analyze_memory_patterns"}' | # similar pattern
```

### Performance Monitoring
```bash
# Resource usage
htop

# Disk usage
du -sh memory_graph_db/
du -sh ~/.ollama/models/

# Network (should be minimal/zero)
netstat -an | grep 11434
```

## 🤝 Contributing

1. Fork the repository
2. Create a feature branch
3. Test with Ollama setup
4. Submit a pull request

## 📄 License

MIT License - see LICENSE file for details.

---

**🦙 Self-Hosted Memory for the MCP Ecosystem**

This memory server demonstrates how to build completely self-hosted AI systems with no external dependencies while maintaining high performance and sophisticated memory capabilities. Perfect for privacy-focused applications where data control is paramount.

**Sacred Trust Approved** ✅ - No data leaves your infrastructure, ever.