# MCP Office Tools **Comprehensive Microsoft Office document processing server for the MCP (Model Context Protocol) ecosystem.** [![Python 3.11+](https://img.shields.io/badge/python-3.11+-blue.svg)](https://www.python.org/downloads/) [![FastMCP](https://img.shields.io/badge/FastMCP-0.5+-green.svg)](https://github.com/jlowin/fastmcp) [![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT) MCP Office Tools provides **30+ comprehensive tools** for processing Microsoft Office documents including Word (.docx, .doc), Excel (.xlsx, .xls), PowerPoint (.pptx, .ppt), and CSV files. Built as a companion to [MCP PDF Tools](https://github.com/mcp-pdf-tools/mcp-pdf-tools), it offers the same level of quality and robustness for Office document processing. ## 🌟 Key Features ### **Universal Format Support** - **Word Documents**: `.docx`, `.doc`, `.docm`, `.dotx`, `.dot` - **Excel Spreadsheets**: `.xlsx`, `.xls`, `.xlsm`, `.xltx`, `.xlt`, `.csv` - **PowerPoint Presentations**: `.pptx`, `.ppt`, `.pptm`, `.potx`, `.pot` - **Legacy Compatibility**: Full support for Office 97-2003 formats ### **Intelligent Processing** - **Multi-library fallback system** for robust document processing - **Automatic format detection** and validation - **Smart method selection** based on document type and complexity - **URL support** with intelligent caching (1-hour cache) ### **Comprehensive Tool Suite** - **Universal Tools** (8): Work across all Office formats - **Word Tools** (8): Specialized document processing - **Excel Tools** (8): Advanced spreadsheet analysis - **PowerPoint Tools** (6): Presentation content extraction ## 🚀 Quick Start ### Installation ```bash # Install with uv (recommended) uv add mcp-office-tools # Or with pip pip install mcp-office-tools ``` ### Basic Usage ```bash # Run the MCP server mcp-office-tools # Or run directly with Python python -m mcp_office_tools.server ``` ### Integration with Claude Desktop Add to your `claude_desktop_config.json`: ```json { "mcpServers": { "mcp-office-tools": { "command": "mcp-office-tools" } } } ``` ## 📊 Tool Categories ### **📄 Universal Processing Tools** Work across all Office formats with intelligent format detection: | Tool | Description | Formats | |------|-------------|---------| | `extract_text` | Multi-method text extraction | All formats | | `extract_images` | Image extraction with filtering | Word, Excel, PowerPoint | | `extract_metadata` | Document properties and statistics | All formats | | `detect_office_format` | Format detection and analysis | All formats | | `analyze_document_health` | File integrity and health check | All formats | ### **📝 Word Document Tools** Specialized for Word documents (.docx, .doc, .docm): ```python # Extract text with formatting preservation result = await extract_text("document.docx", preserve_formatting=True) # Get document structure and metadata metadata = await extract_metadata("report.doc") # Health check for legacy documents health = await analyze_document_health("old_document.doc") ``` ### **📊 Excel Spreadsheet Tools** Advanced spreadsheet processing (.xlsx, .xls, .csv): ```python # Extract data from all worksheets data = await extract_text("spreadsheet.xlsx", preserve_formatting=True) # Process CSV files csv_data = await extract_text("data.csv") # Legacy Excel support legacy_data = await extract_text("old_data.xls") ``` ### **🎯 PowerPoint Tools** Presentation content extraction (.pptx, .ppt): ```python # Extract slide content slides = await extract_text("presentation.pptx", preserve_formatting=True) # Get presentation metadata info = await extract_metadata("slideshow.pptx") ``` ## 🔧 Real-World Use Cases ### **Business Intelligence & Reporting** ```python # Process quarterly reports across formats word_summary = await extract_text("quarterly-report.docx") excel_data = await extract_text("financial-data.xlsx", preserve_formatting=True) ppt_insights = await extract_text("presentation.pptx") # Cross-format health analysis health_check = await analyze_document_health("legacy-report.doc") ``` ### **Document Migration & Modernization** ```python # Legacy document processing legacy_docs = ["policy.doc", "procedures.xls", "training.ppt"] for doc in legacy_docs: # Format detection format_info = await detect_office_format(doc) # Health assessment health = await analyze_document_health(doc) # Content extraction content = await extract_text(doc) ``` ### **Content Analysis & Extraction** ```python # Multi-format content processing documents = ["research.docx", "data.xlsx", "slides.pptx"] for doc in documents: # Comprehensive analysis text = await extract_text(doc, preserve_formatting=True) images = await extract_images(doc, min_width=200, min_height=200) metadata = await extract_metadata(doc) ``` ## 🏗️ Architecture ### **Multi-Library Approach** MCP Office Tools uses multiple libraries with intelligent fallbacks: **Word Documents:** - `python-docx` → `mammoth` → `docx2txt` → `olefile` (legacy) **Excel Spreadsheets:** - `openpyxl` → `pandas` → `xlrd` (legacy) **PowerPoint Presentations:** - `python-pptx` → `olefile` (legacy) ### **Format Support Matrix** | Format | Text | Images | Metadata | Legacy | |--------|------|--------|----------|--------| | .docx | ✅ | ✅ | ✅ | N/A | | .doc | ✅ | ⚠️ | ⚠️ | ✅ | | .xlsx | ✅ | ✅ | ✅ | N/A | | .xls | ✅ | ⚠️ | ⚠️ | ✅ | | .pptx | ✅ | ✅ | ✅ | N/A | | .ppt | ⚠️ | ⚠️ | ⚠️ | ✅ | | .csv | ✅ | N/A | ⚠️ | N/A | *✅ Full support, ⚠️ Basic support, N/A Not applicable* ## 🔍 Advanced Features ### **URL Processing** Process Office documents directly from URLs: ```python # Direct URL processing url_doc = "https://example.com/document.docx" content = await extract_text(url_doc) # Automatic caching (1-hour default) cached_content = await extract_text(url_doc) # Uses cache ``` ### **Format Detection** Intelligent format detection and validation: ```python # Comprehensive format analysis format_info = await detect_office_format("unknown_file.office") # Returns: # - Format name and category # - MIME type validation # - Legacy vs modern classification # - Processing recommendations ``` ### **Document Health Analysis** Comprehensive document integrity checking: ```python # Health assessment health = await analyze_document_health("suspicious_file.docx") # Returns: # - Health score (1-10) # - Validation results # - Corruption detection # - Processing recommendations ``` ## 📈 Performance & Compatibility ### **System Requirements** - **Python**: 3.11+ - **Memory**: 512MB+ available RAM - **Storage**: 100MB+ for dependencies ### **Dependencies** - **Core**: FastMCP, python-docx, openpyxl, python-pptx - **Legacy**: olefile, xlrd, msoffcrypto-tool - **Enhancement**: mammoth, pandas, Pillow ### **Platform Support** - ✅ **Linux** (Ubuntu 20.04+, RHEL 8+) - ✅ **macOS** (10.15+) - ✅ **Windows** (10/11) - ✅ **Docker** containers ## 🛠️ Development ### **Setup Development Environment** ```bash # Clone repository git clone https://github.com/mcp-office-tools/mcp-office-tools.git cd mcp-office-tools # Install with development dependencies uv sync --dev # Run tests uv run pytest # Code quality checks uv run black src/ tests/ uv run ruff check src/ tests/ uv run mypy src/ ``` ### **Testing** ```bash # Run all tests uv run pytest # Run with coverage uv run pytest --cov=mcp_office_tools # Test specific format uv run pytest tests/test_word_extraction.py ``` ## 🤝 Integration with MCP PDF Tools MCP Office Tools is designed as a perfect companion to [MCP PDF Tools](https://github.com/mcp-pdf-tools/mcp-pdf-tools): ```python # Unified document processing workflow pdf_content = await pdf_tools.extract_text("document.pdf") docx_content = await office_tools.extract_text("document.docx") # Cross-format analysis pdf_metadata = await pdf_tools.extract_metadata("document.pdf") docx_metadata = await office_tools.extract_metadata("document.docx") ``` ## 📋 Supported Formats ```python # Get all supported formats formats = await get_supported_formats() # Returns comprehensive format information: # - 15+ file extensions # - MIME type mappings # - Category classifications # - Processing capabilities ``` ## 🔒 Security & Privacy - **No data collection**: Documents processed locally - **Temporary files**: Automatic cleanup after processing - **URL validation**: Secure HTTPS-only downloads - **Memory management**: Efficient processing of large files ## 📝 License MIT License - see [LICENSE](LICENSE) file for details. ## 🚀 Coming Soon - **Advanced Excel Tools**: Formula parsing, chart extraction - **PowerPoint Enhancement**: Animation analysis, slide comparison - **Document Conversion**: Cross-format conversion capabilities - **Batch Processing**: Multi-document workflows - **Cloud Integration**: Direct cloud storage support --- **Built with ❤️ for the MCP ecosystem** *MCP Office Tools - Comprehensive Microsoft Office document processing for modern AI workflows.*