Ryan Malloy ba20ae8103 📋 Add project summary and completion status
- Complete overview of what was built (2,914 lines total)
- Usage instructions for immediate deployment
- Technical design decisions documented
- Ready for production use with Ollama provider
2025-06-23 22:34:49 -06:00
2025-06-23 22:34:49 -06:00

Ultimate Memory MCP Server - Ollama Edition 🦙

A high-performance, completely self-hosted memory system for LLMs powered by Ollama. Perfect for privacy-focused AI applications with no external dependencies or costs.

Built with FastMCP 2.8.1+ and Kuzu Graph Database for optimal performance.

🚀 Features

  • 🧠 Graph-Native Memory: Stores memories as nodes with rich relationship modeling
  • 🔍 Multi-Modal Search: Semantic similarity + keyword matching + graph traversal
  • 🕸️ Intelligent Relationships: Auto-generates connections based on semantic similarity
  • 🦙 Ollama-Powered: Self-hosted embeddings with complete privacy
  • 📊 Graph Analytics: Pattern analysis and centrality detection
  • 🎯 Memory Types: Episodic, semantic, and procedural memory classification
  • 🔒 Zero External Deps: No API keys, no cloud services, no data sharing

🦙 Why Ollama?

Perfect for "Sacred Trust" AI systems:

  • 100% Private - All processing happens on your hardware
  • Zero Costs - No API fees, no usage limits
  • Always Available - No network dependencies or outages
  • Predictable - You control updates and behavior
  • High Quality - nomic-embed-text rivals commercial solutions
  • Self-Contained - Complete system in your control

Quick Start

1. Install Ollama

# Linux/macOS
curl -fsSL https://ollama.ai/install.sh | sh

# Or download from https://ollama.ai/

2. Setup Memory Server

cd /home/rpm/claude/mcp-ultimate-memory

# Automated setup (recommended)
./setup.sh

# Or manual setup:
pip install -r requirements.txt
cp .env.example .env

3. Start Ollama & Pull Models

# Start Ollama server (keep running)
ollama serve &

# Pull embedding model
ollama pull nomic-embed-text

# Optional: Pull summary model
ollama pull llama3.2:1b

4. Test & Run

# Test everything works
python test_server.py

# Start the memory server
python memory_mcp_server.py

🛠️ Available MCP Tools

Core Memory Operations

  • store_memory - Store with automatic relationship detection
  • search_memories - Semantic + keyword search
  • get_memory - Retrieve by ID with access tracking
  • find_connected_memories - Graph traversal
  • create_relationship - Manual relationship creation
  • get_conversation_memories - Conversation context
  • delete_memory - Memory removal
  • analyze_memory_patterns - Graph analytics

Ollama Management

  • check_ollama_status - Server status and configuration

🧠 Memory Types & Examples

Episodic Memories

Specific events with temporal context.

await store_memory(
    content="User clicked save button at 2:30 PM during demo",
    memory_type="episodic",
    tags=["user-action", "timing", "demo"]
)

Semantic Memories

General facts and preferences.

await store_memory(
    content="User prefers dark mode for reduced eye strain",
    memory_type="semantic",
    tags=["preference", "ui", "health"]
)

Procedural Memories

Step-by-step instructions.

await store_memory(
    content="To enable dark mode: Settings → Appearance → Dark",
    memory_type="procedural", 
    tags=["instructions", "ui"]
)

🔍 Search Examples

# Finds memories by meaning, not just keywords
results = await search_memories(
    query="user interface preferences and accessibility",
    search_type="semantic",
    max_results=10
)
# Fast exact text matching
results = await search_memories(
    query="dark mode",
    search_type="keyword"
)

Graph Traversal

# Find connected memories through relationships
connections = await find_connected_memories(
    memory_id="preference_memory_id",
    max_depth=3,
    min_strength=0.5
)

🔧 Configuration

Environment Variables

# Database location
KUZU_DB_PATH=./memory_graph_db

# Ollama server configuration  
OLLAMA_BASE_URL=http://localhost:11434
OLLAMA_EMBEDDING_MODEL=nomic-embed-text

MCP Client Configuration

{
  "mcpServers": {
    "memory": {
      "command": "python",
      "args": ["/path/to/memory_mcp_server.py"],
      "env": {
        "KUZU_DB_PATH": "/path/to/memory_graph_db",
        "OLLAMA_BASE_URL": "http://localhost:11434", 
        "OLLAMA_EMBEDDING_MODEL": "nomic-embed-text"
      }
    }
  }
}

📊 Ollama Model Recommendations

For Sacred Trust / Production Use

# Primary embedding model (best balance)
ollama pull nomic-embed-text       # 274MB, excellent quality

# Summary model (optional but recommended)  
ollama pull llama3.2:1b           # 1.3GB, fast summaries

Alternative Models

# Faster, smaller (if resources are limited)
ollama pull all-minilm            # 23MB, decent quality

# Higher quality (if you have resources)
ollama pull mxbai-embed-large     # 669MB, best quality

Model Comparison

Model Size Quality Speed Memory
nomic-embed-text 274MB 1.5GB
all-minilm 23MB 512MB
mxbai-embed-large 669MB 2.5GB

🧪 Testing & Verification

Test Ollama Connection

python test_server.py --connection-only

Test Full System

python test_server.py

Check Ollama Status

# Via test script
python test_server.py --help-setup

# Direct curl
curl http://localhost:11434/api/tags

# List models
ollama list

Performance & Resource Usage

System Requirements

  • Minimum: 4GB RAM, 2 CPU cores, 2GB storage
  • Recommended: 8GB RAM, 4 CPU cores, 5GB storage
  • Operating System: Linux, macOS, Windows

Performance Characteristics

  • First Request: ~2-3 seconds (model loading)
  • Subsequent Requests: ~500-800ms per embedding
  • Memory Usage: ~1.5GB RAM resident
  • CPU Usage: ~20% during embedding, ~0% idle

Optimization Tips

  1. Keep Ollama running - Avoid model reload overhead
  2. Use SSD storage - Faster model loading
  3. Batch operations - Group multiple memories for efficiency
  4. Monitor resources - htop to check RAM/CPU usage

🚨 Troubleshooting

Common Issues

  1. "Connection refused"

    # Start Ollama server
    ollama serve
    
    # Check if running  
    ps aux | grep ollama
    
  2. "Model not found"

    # List available models
    ollama list
    
    # Pull required model
    ollama pull nomic-embed-text
    
  3. Slow performance

    # Check system resources
    htop
    
    # Try smaller model
    ollama pull all-minilm
    
  4. Out of memory

    # Use minimal model
    ollama pull all-minilm
    
    # Check memory usage
    free -h
    

Debug Commands

# Test Ollama directly
curl http://localhost:11434/api/tags

# Test embedding generation
curl http://localhost:11434/api/embeddings \
  -d '{"model": "nomic-embed-text", "prompt": "test"}'

# Check server logs
journalctl -u ollama -f  # if running as service

🔒 Security & Privacy

Complete Data Privacy

  • No External Calls - Everything runs locally
  • No Telemetry - Ollama doesn't phone home
  • Your Hardware - You control the infrastructure
  • Audit Trail - Full visibility into operations
  1. Firewall Rules - Block external access to Ollama port
  2. Regular Updates - Keep Ollama and models updated
  3. Backup Strategy - Regular backups of memory_graph_db
  4. Access Control - Limit who can access the server

🚀 Production Deployment

Running as a Service (Linux)

# Create systemd service for Ollama
sudo tee /etc/systemd/system/ollama.service << EOF
[Unit]
Description=Ollama Server
After=network.target

[Service]
Type=simple
User=ollama
ExecStart=/usr/local/bin/ollama serve
Restart=always
Environment=OLLAMA_HOST=0.0.0.0:11434

[Install]
WantedBy=multi-user.target
EOF

sudo systemctl enable ollama
sudo systemctl start ollama

Memory Server as Service

# Create service for memory server
sudo tee /etc/systemd/system/memory-server.service << EOF
[Unit]
Description=Memory MCP Server
After=ollama.service
Requires=ollama.service

[Service]
Type=simple
User=memory
WorkingDirectory=/path/to/mcp-ultimate-memory
ExecStart=/usr/bin/python memory_mcp_server.py
Restart=always
Environment=KUZU_DB_PATH=/path/to/memory_graph_db
Environment=OLLAMA_BASE_URL=http://localhost:11434

[Install]
WantedBy=multi-user.target
EOF

sudo systemctl enable memory-server
sudo systemctl start memory-server

📊 Monitoring

Health Checks

# Check Ollama status via MCP tool
echo '{"tool": "check_ollama_status"}' | python -c "
import json, asyncio
from memory_mcp_server import *
# ... health check code
"

# Check memory graph statistics
echo '{"tool": "analyze_memory_patterns"}' | # similar pattern

Performance Monitoring

# Resource usage
htop

# Disk usage
du -sh memory_graph_db/
du -sh ~/.ollama/models/

# Network (should be minimal/zero)
netstat -an | grep 11434

🤝 Contributing

  1. Fork the repository
  2. Create a feature branch
  3. Test with Ollama setup
  4. Submit a pull request

📄 License

MIT License - see LICENSE file for details.


🦙 Self-Hosted Memory for the MCP Ecosystem

This memory server demonstrates how to build completely self-hosted AI systems with no external dependencies while maintaining high performance and sophisticated memory capabilities. Perfect for privacy-focused applications where data control is paramount.

Sacred Trust Approved - No data leaves your infrastructure, ever.

Description
🧠 Ultimate Memory MCP Server - Graph-based memory system for LLMs with multi-provider embedding support (OpenAI/Ollama/Sentence Transformers). Features intelligent relationship detection, semantic search, and privacy-first self-hosted deployment.
Readme 57 KiB
Languages
Python 86.5%
Shell 7.4%
Cypher 6.1%