# Ultimate Memory MCP Server - Ollama Edition πŸ¦™ A high-performance, **completely self-hosted** memory system for LLMs powered by **Ollama**. Perfect for privacy-focused AI applications with no external dependencies or costs. Built with **FastMCP 2.8.1+** and **Kuzu Graph Database** for optimal performance. ## πŸš€ Features - **🧠 Graph-Native Memory**: Stores memories as nodes with rich relationship modeling - **πŸ” Multi-Modal Search**: Semantic similarity + keyword matching + graph traversal - **πŸ•ΈοΈ Intelligent Relationships**: Auto-generates connections based on semantic similarity - **πŸ¦™ Ollama-Powered**: Self-hosted embeddings with complete privacy - **πŸ“Š Graph Analytics**: Pattern analysis and centrality detection - **🎯 Memory Types**: Episodic, semantic, and procedural memory classification - **πŸ”’ Zero External Deps**: No API keys, no cloud services, no data sharing ## πŸ¦™ Why Ollama? **Perfect for "Sacred Trust" AI systems:** - **100% Private** - All processing happens on your hardware - **Zero Costs** - No API fees, no usage limits - **Always Available** - No network dependencies or outages - **Predictable** - You control updates and behavior - **High Quality** - nomic-embed-text rivals commercial solutions - **Self-Contained** - Complete system in your control ## Quick Start ### 1. Install Ollama ```bash # Linux/macOS curl -fsSL https://ollama.ai/install.sh | sh # Or download from https://ollama.ai/ ``` ### 2. Setup Memory Server ```bash cd /home/rpm/claude/mcp-ultimate-memory # Automated setup (recommended) ./setup.sh # Or manual setup: pip install -r requirements.txt cp .env.example .env ``` ### 3. Start Ollama & Pull Models ```bash # Start Ollama server (keep running) ollama serve & # Pull embedding model ollama pull nomic-embed-text # Optional: Pull summary model ollama pull llama3.2:1b ``` ### 4. Test & Run ```bash # Test everything works python test_server.py # Start the memory server python memory_mcp_server.py ``` ## πŸ› οΈ Available MCP Tools ### Core Memory Operations - **`store_memory`** - Store with automatic relationship detection - **`search_memories`** - Semantic + keyword search - **`get_memory`** - Retrieve by ID with access tracking - **`find_connected_memories`** - Graph traversal - **`create_relationship`** - Manual relationship creation - **`get_conversation_memories`** - Conversation context - **`delete_memory`** - Memory removal - **`analyze_memory_patterns`** - Graph analytics ### Ollama Management - **`check_ollama_status`** - Server status and configuration ## 🧠 Memory Types & Examples ### Episodic Memories Specific events with temporal context. ```python await store_memory( content="User clicked save button at 2:30 PM during demo", memory_type="episodic", tags=["user-action", "timing", "demo"] ) ``` ### Semantic Memories General facts and preferences. ```python await store_memory( content="User prefers dark mode for reduced eye strain", memory_type="semantic", tags=["preference", "ui", "health"] ) ``` ### Procedural Memories Step-by-step instructions. ```python await store_memory( content="To enable dark mode: Settings β†’ Appearance β†’ Dark", memory_type="procedural", tags=["instructions", "ui"] ) ``` ## πŸ” Search Examples ### Semantic Search (Recommended) ```python # Finds memories by meaning, not just keywords results = await search_memories( query="user interface preferences and accessibility", search_type="semantic", max_results=10 ) ``` ### Keyword Search ```python # Fast exact text matching results = await search_memories( query="dark mode", search_type="keyword" ) ``` ### Graph Traversal ```python # Find connected memories through relationships connections = await find_connected_memories( memory_id="preference_memory_id", max_depth=3, min_strength=0.5 ) ``` ## πŸ”§ Configuration ### Environment Variables ```env # Database location KUZU_DB_PATH=./memory_graph_db # Ollama server configuration OLLAMA_BASE_URL=http://localhost:11434 OLLAMA_EMBEDDING_MODEL=nomic-embed-text ``` ### MCP Client Configuration ```json { "mcpServers": { "memory": { "command": "python", "args": ["/path/to/memory_mcp_server.py"], "env": { "KUZU_DB_PATH": "/path/to/memory_graph_db", "OLLAMA_BASE_URL": "http://localhost:11434", "OLLAMA_EMBEDDING_MODEL": "nomic-embed-text" } } } } ``` ## πŸ“Š Ollama Model Recommendations ### For Sacred Trust / Production Use ```bash # Primary embedding model (best balance) ollama pull nomic-embed-text # 274MB, excellent quality # Summary model (optional but recommended) ollama pull llama3.2:1b # 1.3GB, fast summaries ``` ### Alternative Models ```bash # Faster, smaller (if resources are limited) ollama pull all-minilm # 23MB, decent quality # Higher quality (if you have resources) ollama pull mxbai-embed-large # 669MB, best quality ``` ### Model Comparison | Model | Size | Quality | Speed | Memory | |-------|------|---------|--------|---------| | nomic-embed-text | 274MB | ⭐⭐⭐⭐ | ⭐⭐⭐⭐ | 1.5GB | | all-minilm | 23MB | ⭐⭐⭐ | ⭐⭐⭐⭐⭐ | 512MB | | mxbai-embed-large | 669MB | ⭐⭐⭐⭐⭐ | ⭐⭐⭐ | 2.5GB | ## πŸ§ͺ Testing & Verification ### Test Ollama Connection ```bash python test_server.py --connection-only ``` ### Test Full System ```bash python test_server.py ``` ### Check Ollama Status ```bash # Via test script python test_server.py --help-setup # Direct curl curl http://localhost:11434/api/tags # List models ollama list ``` ## ⚑ Performance & Resource Usage ### System Requirements - **Minimum**: 4GB RAM, 2 CPU cores, 2GB storage - **Recommended**: 8GB RAM, 4 CPU cores, 5GB storage - **Operating System**: Linux, macOS, Windows ### Performance Characteristics - **First Request**: ~2-3 seconds (model loading) - **Subsequent Requests**: ~500-800ms per embedding - **Memory Usage**: ~1.5GB RAM resident - **CPU Usage**: ~20% during embedding, ~0% idle ### Optimization Tips 1. **Keep Ollama running** - Avoid model reload overhead 2. **Use SSD storage** - Faster model loading 3. **Batch operations** - Group multiple memories for efficiency 4. **Monitor resources** - `htop` to check RAM/CPU usage ## 🚨 Troubleshooting ### Common Issues 1. **"Connection refused"** ```bash # Start Ollama server ollama serve # Check if running ps aux | grep ollama ``` 2. **"Model not found"** ```bash # List available models ollama list # Pull required model ollama pull nomic-embed-text ``` 3. **Slow performance** ```bash # Check system resources htop # Try smaller model ollama pull all-minilm ``` 4. **Out of memory** ```bash # Use minimal model ollama pull all-minilm # Check memory usage free -h ``` ### Debug Commands ```bash # Test Ollama directly curl http://localhost:11434/api/tags # Test embedding generation curl http://localhost:11434/api/embeddings \ -d '{"model": "nomic-embed-text", "prompt": "test"}' # Check server logs journalctl -u ollama -f # if running as service ``` ## πŸ”’ Security & Privacy ### Complete Data Privacy - **No External Calls** - Everything runs locally - **No Telemetry** - Ollama doesn't phone home - **Your Hardware** - You control the infrastructure - **Audit Trail** - Full visibility into operations ### Recommended Security Practices 1. **Firewall Rules** - Block external access to Ollama port 2. **Regular Updates** - Keep Ollama and models updated 3. **Backup Strategy** - Regular backups of memory_graph_db 4. **Access Control** - Limit who can access the server ## πŸš€ Production Deployment ### Running as a Service (Linux) ```bash # Create systemd service for Ollama sudo tee /etc/systemd/system/ollama.service << EOF [Unit] Description=Ollama Server After=network.target [Service] Type=simple User=ollama ExecStart=/usr/local/bin/ollama serve Restart=always Environment=OLLAMA_HOST=0.0.0.0:11434 [Install] WantedBy=multi-user.target EOF sudo systemctl enable ollama sudo systemctl start ollama ``` ### Memory Server as Service ```bash # Create service for memory server sudo tee /etc/systemd/system/memory-server.service << EOF [Unit] Description=Memory MCP Server After=ollama.service Requires=ollama.service [Service] Type=simple User=memory WorkingDirectory=/path/to/mcp-ultimate-memory ExecStart=/usr/bin/python memory_mcp_server.py Restart=always Environment=KUZU_DB_PATH=/path/to/memory_graph_db Environment=OLLAMA_BASE_URL=http://localhost:11434 [Install] WantedBy=multi-user.target EOF sudo systemctl enable memory-server sudo systemctl start memory-server ``` ## πŸ“Š Monitoring ### Health Checks ```bash # Check Ollama status via MCP tool echo '{"tool": "check_ollama_status"}' | python -c " import json, asyncio from memory_mcp_server import * # ... health check code " # Check memory graph statistics echo '{"tool": "analyze_memory_patterns"}' | # similar pattern ``` ### Performance Monitoring ```bash # Resource usage htop # Disk usage du -sh memory_graph_db/ du -sh ~/.ollama/models/ # Network (should be minimal/zero) netstat -an | grep 11434 ``` ## 🀝 Contributing 1. Fork the repository 2. Create a feature branch 3. Test with Ollama setup 4. Submit a pull request ## πŸ“„ License MIT License - see LICENSE file for details. --- **πŸ¦™ Self-Hosted Memory for the MCP Ecosystem** This memory server demonstrates how to build completely self-hosted AI systems with no external dependencies while maintaining high performance and sophisticated memory capabilities. Perfect for privacy-focused applications where data control is paramount. **Sacred Trust Approved** βœ… - No data leaves your infrastructure, ever.