Go to file

Ryan Malloy ba20ae8103 📋 Add project summary and completion status

- Complete overview of what was built (2,914 lines total)
- Usage instructions for immediate deployment
- Technical design decisions documented
- Ready for production use with Ollama provider

2025-06-23 22:34:49 -06:00

.env.example

🧠 Initial commit: Ultimate Memory MCP Server with Multi-Provider Support

2025-06-23 22:34:12 -06:00

examples.py

🧠 Initial commit: Ultimate Memory MCP Server with Multi-Provider Support

2025-06-23 22:34:12 -06:00

mcp_config_example.json

🧠 Initial commit: Ultimate Memory MCP Server with Multi-Provider Support

2025-06-23 22:34:12 -06:00

memory_mcp_server.py

🧠 Initial commit: Ultimate Memory MCP Server with Multi-Provider Support

2025-06-23 22:34:12 -06:00

OLLAMA_SETUP.md

🧠 Initial commit: Ultimate Memory MCP Server with Multi-Provider Support

2025-06-23 22:34:12 -06:00

PROJECT_STRUCTURE.md

🧠 Initial commit: Ultimate Memory MCP Server with Multi-Provider Support

2025-06-23 22:34:12 -06:00

README.md

🧠 Initial commit: Ultimate Memory MCP Server with Multi-Provider Support

2025-06-23 22:34:12 -06:00

requirements.txt

🧠 Initial commit: Ultimate Memory MCP Server with Multi-Provider Support

2025-06-23 22:34:12 -06:00

RESUME.md

📋 Add project summary and completion status

2025-06-23 22:34:49 -06:00

schema.cypher

🧠 Initial commit: Ultimate Memory MCP Server with Multi-Provider Support

2025-06-23 22:34:12 -06:00

setup.sh

🧠 Initial commit: Ultimate Memory MCP Server with Multi-Provider Support

2025-06-23 22:34:12 -06:00

test_server.py

🧠 Initial commit: Ultimate Memory MCP Server with Multi-Provider Support

2025-06-23 22:34:12 -06:00

README.md

Ultimate Memory MCP Server - Ollama Edition 🦙

A high-performance, completely self-hosted memory system for LLMs powered by Ollama. Perfect for privacy-focused AI applications with no external dependencies or costs.

Built with FastMCP 2.8.1+ and Kuzu Graph Database for optimal performance.

🚀 Features

🧠 Graph-Native Memory: Stores memories as nodes with rich relationship modeling
🔍 Multi-Modal Search: Semantic similarity + keyword matching + graph traversal
🕸️ Intelligent Relationships: Auto-generates connections based on semantic similarity
🦙 Ollama-Powered: Self-hosted embeddings with complete privacy
📊 Graph Analytics: Pattern analysis and centrality detection
🎯 Memory Types: Episodic, semantic, and procedural memory classification
🔒 Zero External Deps: No API keys, no cloud services, no data sharing

🦙 Why Ollama?

Perfect for "Sacred Trust" AI systems:

100% Private - All processing happens on your hardware
Zero Costs - No API fees, no usage limits
Always Available - No network dependencies or outages
Predictable - You control updates and behavior
High Quality - nomic-embed-text rivals commercial solutions
Self-Contained - Complete system in your control

Quick Start

1. Install Ollama

# Linux/macOS
curl -fsSL https://ollama.ai/install.sh | sh

# Or download from https://ollama.ai/

2. Setup Memory Server

cd /home/rpm/claude/mcp-ultimate-memory

# Automated setup (recommended)
./setup.sh

# Or manual setup:
pip install -r requirements.txt
cp .env.example .env

3. Start Ollama & Pull Models

# Start Ollama server (keep running)
ollama serve &

# Pull embedding model
ollama pull nomic-embed-text

# Optional: Pull summary model
ollama pull llama3.2:1b

4. Test & Run

# Test everything works
python test_server.py

# Start the memory server
python memory_mcp_server.py

🛠️ Available MCP Tools

Core Memory Operations

store_memory - Store with automatic relationship detection
search_memories - Semantic + keyword search
get_memory - Retrieve by ID with access tracking
find_connected_memories - Graph traversal
create_relationship - Manual relationship creation
get_conversation_memories - Conversation context
delete_memory - Memory removal
analyze_memory_patterns - Graph analytics

Ollama Management

check_ollama_status - Server status and configuration

🧠 Memory Types & Examples

Episodic Memories

Specific events with temporal context.

await store_memory(
    content="User clicked save button at 2:30 PM during demo",
    memory_type="episodic",
    tags=["user-action", "timing", "demo"]
)

Semantic Memories

General facts and preferences.

await store_memory(
    content="User prefers dark mode for reduced eye strain",
    memory_type="semantic",
    tags=["preference", "ui", "health"]
)

Procedural Memories

Step-by-step instructions.

await store_memory(
    content="To enable dark mode: Settings → Appearance → Dark",
    memory_type="procedural", 
    tags=["instructions", "ui"]
)

🔍 Search Examples

Semantic Search (Recommended)

# Finds memories by meaning, not just keywords
results = await search_memories(
    query="user interface preferences and accessibility",
    search_type="semantic",
    max_results=10
)

Keyword Search

# Fast exact text matching
results = await search_memories(
    query="dark mode",
    search_type="keyword"
)

Graph Traversal

# Find connected memories through relationships
connections = await find_connected_memories(
    memory_id="preference_memory_id",
    max_depth=3,
    min_strength=0.5
)

🔧 Configuration

Environment Variables

# Database location
KUZU_DB_PATH=./memory_graph_db

# Ollama server configuration  
OLLAMA_BASE_URL=http://localhost:11434
OLLAMA_EMBEDDING_MODEL=nomic-embed-text

MCP Client Configuration

{
  "mcpServers": {
    "memory": {
      "command": "python",
      "args": ["/path/to/memory_mcp_server.py"],
      "env": {
        "KUZU_DB_PATH": "/path/to/memory_graph_db",
        "OLLAMA_BASE_URL": "http://localhost:11434", 
        "OLLAMA_EMBEDDING_MODEL": "nomic-embed-text"
      }
    }
  }
}

📊 Ollama Model Recommendations

For Sacred Trust / Production Use

# Primary embedding model (best balance)
ollama pull nomic-embed-text       # 274MB, excellent quality

# Summary model (optional but recommended)  
ollama pull llama3.2:1b           # 1.3GB, fast summaries

Alternative Models

# Faster, smaller (if resources are limited)
ollama pull all-minilm            # 23MB, decent quality

# Higher quality (if you have resources)
ollama pull mxbai-embed-large     # 669MB, best quality

Model Comparison

Model	Size	Quality	Speed	Memory
nomic-embed-text	274MB	⭐⭐⭐⭐	⭐⭐⭐⭐	1.5GB
all-minilm	23MB	⭐⭐⭐	⭐⭐⭐⭐⭐	512MB
mxbai-embed-large	669MB	⭐⭐⭐⭐⭐	⭐⭐⭐	2.5GB

🧪 Testing & Verification

Test Ollama Connection

python test_server.py --connection-only

Test Full System

python test_server.py

Check Ollama Status

# Via test script
python test_server.py --help-setup

# Direct curl
curl http://localhost:11434/api/tags

# List models
ollama list

⚡ Performance & Resource Usage

System Requirements

Minimum: 4GB RAM, 2 CPU cores, 2GB storage
Recommended: 8GB RAM, 4 CPU cores, 5GB storage
Operating System: Linux, macOS, Windows

Performance Characteristics

First Request: ~2-3 seconds (model loading)
Subsequent Requests: ~500-800ms per embedding
Memory Usage: ~1.5GB RAM resident
CPU Usage: ~20% during embedding, ~0% idle

Optimization Tips

Keep Ollama running - Avoid model reload overhead
Use SSD storage - Faster model loading
Batch operations - Group multiple memories for efficiency
Monitor resources - htop to check RAM/CPU usage

🚨 Troubleshooting

Common Issues

"Connection refused"

# Start Ollama server
ollama serve

# Check if running  
ps aux | grep ollama

"Model not found"

# List available models
ollama list

# Pull required model
ollama pull nomic-embed-text

Slow performance

# Check system resources
htop

# Try smaller model
ollama pull all-minilm

Out of memory

# Use minimal model
ollama pull all-minilm

# Check memory usage
free -h

Debug Commands

# Test Ollama directly
curl http://localhost:11434/api/tags

# Test embedding generation
curl http://localhost:11434/api/embeddings \
  -d '{"model": "nomic-embed-text", "prompt": "test"}'

# Check server logs
journalctl -u ollama -f  # if running as service

🔒 Security & Privacy

Complete Data Privacy

No External Calls - Everything runs locally
No Telemetry - Ollama doesn't phone home
Your Hardware - You control the infrastructure
Audit Trail - Full visibility into operations

Recommended Security Practices

Firewall Rules - Block external access to Ollama port
Regular Updates - Keep Ollama and models updated
Backup Strategy - Regular backups of memory_graph_db
Access Control - Limit who can access the server

🚀 Production Deployment

Running as a Service (Linux)

# Create systemd service for Ollama
sudo tee /etc/systemd/system/ollama.service << EOF
[Unit]
Description=Ollama Server
After=network.target

[Service]
Type=simple
User=ollama
ExecStart=/usr/local/bin/ollama serve
Restart=always
Environment=OLLAMA_HOST=0.0.0.0:11434

[Install]
WantedBy=multi-user.target
EOF

sudo systemctl enable ollama
sudo systemctl start ollama

Memory Server as Service

# Create service for memory server
sudo tee /etc/systemd/system/memory-server.service << EOF
[Unit]
Description=Memory MCP Server
After=ollama.service
Requires=ollama.service

[Service]
Type=simple
User=memory
WorkingDirectory=/path/to/mcp-ultimate-memory
ExecStart=/usr/bin/python memory_mcp_server.py
Restart=always
Environment=KUZU_DB_PATH=/path/to/memory_graph_db
Environment=OLLAMA_BASE_URL=http://localhost:11434

[Install]
WantedBy=multi-user.target
EOF

sudo systemctl enable memory-server
sudo systemctl start memory-server

📊 Monitoring

Health Checks

# Check Ollama status via MCP tool
echo '{"tool": "check_ollama_status"}' | python -c "
import json, asyncio
from memory_mcp_server import *
# ... health check code
"

# Check memory graph statistics
echo '{"tool": "analyze_memory_patterns"}' | # similar pattern

Performance Monitoring

# Resource usage
htop

# Disk usage
du -sh memory_graph_db/
du -sh ~/.ollama/models/

# Network (should be minimal/zero)
netstat -an | grep 11434

🤝 Contributing

Fork the repository
Create a feature branch
Test with Ollama setup
Submit a pull request

📄 License

MIT License - see LICENSE file for details.

🦙 Self-Hosted Memory for the MCP Ecosystem

This memory server demonstrates how to build completely self-hosted AI systems with no external dependencies while maintaining high performance and sophisticated memory capabilities. Perfect for privacy-focused applications where data control is paramount.

Sacred Trust Approved ✅ - No data leaves your infrastructure, ever.