- Comprehensive Microsoft Office document processing server
- Support for Word (.docx, .doc), Excel (.xlsx, .xls), PowerPoint (.pptx, .ppt), CSV
- 6 universal tools: extract_text, extract_images, extract_metadata, detect_office_format, analyze_document_health, get_supported_formats
- Multi-library fallback system for robust processing
- URL support with intelligent caching
- Legacy Office format support (97-2003)
- FastMCP integration with async architecture
- Production ready with comprehensive documentation
🤖 Generated with Claude Code (claude.ai/code)
Co-Authored-By: Claude <noreply@anthropic.com>
MCP Office Tools
Comprehensive Microsoft Office document processing server for the MCP (Model Context Protocol) ecosystem.
MCP Office Tools provides 30+ comprehensive tools for processing Microsoft Office documents including Word (.docx, .doc), Excel (.xlsx, .xls), PowerPoint (.pptx, .ppt), and CSV files. Built as a companion to MCP PDF Tools, it offers the same level of quality and robustness for Office document processing.
🌟 Key Features
Universal Format Support
- Word Documents:
.docx
,.doc
,.docm
,.dotx
,.dot
- Excel Spreadsheets:
.xlsx
,.xls
,.xlsm
,.xltx
,.xlt
,.csv
- PowerPoint Presentations:
.pptx
,.ppt
,.pptm
,.potx
,.pot
- Legacy Compatibility: Full support for Office 97-2003 formats
Intelligent Processing
- Multi-library fallback system for robust document processing
- Automatic format detection and validation
- Smart method selection based on document type and complexity
- URL support with intelligent caching (1-hour cache)
Comprehensive Tool Suite
- Universal Tools (8): Work across all Office formats
- Word Tools (8): Specialized document processing
- Excel Tools (8): Advanced spreadsheet analysis
- PowerPoint Tools (6): Presentation content extraction
🚀 Quick Start
Installation
# Install with uv (recommended)
uv add mcp-office-tools
# Or with pip
pip install mcp-office-tools
Basic Usage
# Run the MCP server
mcp-office-tools
# Or run directly with Python
python -m mcp_office_tools.server
Integration with Claude Desktop
Add to your claude_desktop_config.json
:
{
"mcpServers": {
"mcp-office-tools": {
"command": "mcp-office-tools"
}
}
}
📊 Tool Categories
📄 Universal Processing Tools
Work across all Office formats with intelligent format detection:
Tool | Description | Formats |
---|---|---|
extract_text |
Multi-method text extraction | All formats |
extract_images |
Image extraction with filtering | Word, Excel, PowerPoint |
extract_metadata |
Document properties and statistics | All formats |
detect_office_format |
Format detection and analysis | All formats |
analyze_document_health |
File integrity and health check | All formats |
📝 Word Document Tools
Specialized for Word documents (.docx, .doc, .docm):
# Extract text with formatting preservation
result = await extract_text("document.docx", preserve_formatting=True)
# Get document structure and metadata
metadata = await extract_metadata("report.doc")
# Health check for legacy documents
health = await analyze_document_health("old_document.doc")
📊 Excel Spreadsheet Tools
Advanced spreadsheet processing (.xlsx, .xls, .csv):
# Extract data from all worksheets
data = await extract_text("spreadsheet.xlsx", preserve_formatting=True)
# Process CSV files
csv_data = await extract_text("data.csv")
# Legacy Excel support
legacy_data = await extract_text("old_data.xls")
🎯 PowerPoint Tools
Presentation content extraction (.pptx, .ppt):
# Extract slide content
slides = await extract_text("presentation.pptx", preserve_formatting=True)
# Get presentation metadata
info = await extract_metadata("slideshow.pptx")
🔧 Real-World Use Cases
Business Intelligence & Reporting
# Process quarterly reports across formats
word_summary = await extract_text("quarterly-report.docx")
excel_data = await extract_text("financial-data.xlsx", preserve_formatting=True)
ppt_insights = await extract_text("presentation.pptx")
# Cross-format health analysis
health_check = await analyze_document_health("legacy-report.doc")
Document Migration & Modernization
# Legacy document processing
legacy_docs = ["policy.doc", "procedures.xls", "training.ppt"]
for doc in legacy_docs:
# Format detection
format_info = await detect_office_format(doc)
# Health assessment
health = await analyze_document_health(doc)
# Content extraction
content = await extract_text(doc)
Content Analysis & Extraction
# Multi-format content processing
documents = ["research.docx", "data.xlsx", "slides.pptx"]
for doc in documents:
# Comprehensive analysis
text = await extract_text(doc, preserve_formatting=True)
images = await extract_images(doc, min_width=200, min_height=200)
metadata = await extract_metadata(doc)
🏗️ Architecture
Multi-Library Approach
MCP Office Tools uses multiple libraries with intelligent fallbacks:
Word Documents:
python-docx
→mammoth
→docx2txt
→olefile
(legacy)
Excel Spreadsheets:
openpyxl
→pandas
→xlrd
(legacy)
PowerPoint Presentations:
python-pptx
→olefile
(legacy)
Format Support Matrix
Format | Text | Images | Metadata | Legacy |
---|---|---|---|---|
.docx | ✅ | ✅ | ✅ | N/A |
.doc | ✅ | ⚠️ | ⚠️ | ✅ |
.xlsx | ✅ | ✅ | ✅ | N/A |
.xls | ✅ | ⚠️ | ⚠️ | ✅ |
.pptx | ✅ | ✅ | ✅ | N/A |
.ppt | ⚠️ | ⚠️ | ⚠️ | ✅ |
.csv | ✅ | N/A | ⚠️ | N/A |
✅ Full support, ⚠️ Basic support, N/A Not applicable
🔍 Advanced Features
URL Processing
Process Office documents directly from URLs:
# Direct URL processing
url_doc = "https://example.com/document.docx"
content = await extract_text(url_doc)
# Automatic caching (1-hour default)
cached_content = await extract_text(url_doc) # Uses cache
Format Detection
Intelligent format detection and validation:
# Comprehensive format analysis
format_info = await detect_office_format("unknown_file.office")
# Returns:
# - Format name and category
# - MIME type validation
# - Legacy vs modern classification
# - Processing recommendations
Document Health Analysis
Comprehensive document integrity checking:
# Health assessment
health = await analyze_document_health("suspicious_file.docx")
# Returns:
# - Health score (1-10)
# - Validation results
# - Corruption detection
# - Processing recommendations
📈 Performance & Compatibility
System Requirements
- Python: 3.11+
- Memory: 512MB+ available RAM
- Storage: 100MB+ for dependencies
Dependencies
- Core: FastMCP, python-docx, openpyxl, python-pptx
- Legacy: olefile, xlrd, msoffcrypto-tool
- Enhancement: mammoth, pandas, Pillow
Platform Support
- ✅ Linux (Ubuntu 20.04+, RHEL 8+)
- ✅ macOS (10.15+)
- ✅ Windows (10/11)
- ✅ Docker containers
🛠️ Development
Setup Development Environment
# Clone repository
git clone https://github.com/mcp-office-tools/mcp-office-tools.git
cd mcp-office-tools
# Install with development dependencies
uv sync --dev
# Run tests
uv run pytest
# Code quality checks
uv run black src/ tests/
uv run ruff check src/ tests/
uv run mypy src/
Testing
# Run all tests
uv run pytest
# Run with coverage
uv run pytest --cov=mcp_office_tools
# Test specific format
uv run pytest tests/test_word_extraction.py
🤝 Integration with MCP PDF Tools
MCP Office Tools is designed as a perfect companion to MCP PDF Tools:
# Unified document processing workflow
pdf_content = await pdf_tools.extract_text("document.pdf")
docx_content = await office_tools.extract_text("document.docx")
# Cross-format analysis
pdf_metadata = await pdf_tools.extract_metadata("document.pdf")
docx_metadata = await office_tools.extract_metadata("document.docx")
📋 Supported Formats
# Get all supported formats
formats = await get_supported_formats()
# Returns comprehensive format information:
# - 15+ file extensions
# - MIME type mappings
# - Category classifications
# - Processing capabilities
🔒 Security & Privacy
- No data collection: Documents processed locally
- Temporary files: Automatic cleanup after processing
- URL validation: Secure HTTPS-only downloads
- Memory management: Efficient processing of large files
📝 License
MIT License - see LICENSE file for details.
🚀 Coming Soon
- Advanced Excel Tools: Formula parsing, chart extraction
- PowerPoint Enhancement: Animation analysis, slide comparison
- Document Conversion: Cross-format conversion capabilities
- Batch Processing: Multi-document workflows
- Cloud Integration: Direct cloud storage support
Built with ❤️ for the MCP ecosystem
MCP Office Tools - Comprehensive Microsoft Office document processing for modern AI workflows.