Ryan Malloy b681cb030b Initial commit: MCP Office Tools v0.1.0
- Comprehensive Microsoft Office document processing server
- Support for Word (.docx, .doc), Excel (.xlsx, .xls), PowerPoint (.pptx, .ppt), CSV
- 6 universal tools: extract_text, extract_images, extract_metadata, detect_office_format, analyze_document_health, get_supported_formats
- Multi-library fallback system for robust processing
- URL support with intelligent caching
- Legacy Office format support (97-2003)
- FastMCP integration with async architecture
- Production ready with comprehensive documentation

🤖 Generated with Claude Code (claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-08-18 01:01:48 -06:00

MCP Office Tools

Comprehensive Microsoft Office document processing server for the MCP (Model Context Protocol) ecosystem.

Python 3.11+ FastMCP License: MIT

MCP Office Tools provides 30+ comprehensive tools for processing Microsoft Office documents including Word (.docx, .doc), Excel (.xlsx, .xls), PowerPoint (.pptx, .ppt), and CSV files. Built as a companion to MCP PDF Tools, it offers the same level of quality and robustness for Office document processing.

🌟 Key Features

Universal Format Support

  • Word Documents: .docx, .doc, .docm, .dotx, .dot
  • Excel Spreadsheets: .xlsx, .xls, .xlsm, .xltx, .xlt, .csv
  • PowerPoint Presentations: .pptx, .ppt, .pptm, .potx, .pot
  • Legacy Compatibility: Full support for Office 97-2003 formats

Intelligent Processing

  • Multi-library fallback system for robust document processing
  • Automatic format detection and validation
  • Smart method selection based on document type and complexity
  • URL support with intelligent caching (1-hour cache)

Comprehensive Tool Suite

  • Universal Tools (8): Work across all Office formats
  • Word Tools (8): Specialized document processing
  • Excel Tools (8): Advanced spreadsheet analysis
  • PowerPoint Tools (6): Presentation content extraction

🚀 Quick Start

Installation

# Install with uv (recommended)
uv add mcp-office-tools

# Or with pip
pip install mcp-office-tools

Basic Usage

# Run the MCP server
mcp-office-tools

# Or run directly with Python
python -m mcp_office_tools.server

Integration with Claude Desktop

Add to your claude_desktop_config.json:

{
  "mcpServers": {
    "mcp-office-tools": {
      "command": "mcp-office-tools"
    }
  }
}

📊 Tool Categories

📄 Universal Processing Tools

Work across all Office formats with intelligent format detection:

Tool Description Formats
extract_text Multi-method text extraction All formats
extract_images Image extraction with filtering Word, Excel, PowerPoint
extract_metadata Document properties and statistics All formats
detect_office_format Format detection and analysis All formats
analyze_document_health File integrity and health check All formats

📝 Word Document Tools

Specialized for Word documents (.docx, .doc, .docm):

# Extract text with formatting preservation
result = await extract_text("document.docx", preserve_formatting=True)

# Get document structure and metadata
metadata = await extract_metadata("report.doc")

# Health check for legacy documents
health = await analyze_document_health("old_document.doc")

📊 Excel Spreadsheet Tools

Advanced spreadsheet processing (.xlsx, .xls, .csv):

# Extract data from all worksheets
data = await extract_text("spreadsheet.xlsx", preserve_formatting=True)

# Process CSV files
csv_data = await extract_text("data.csv")

# Legacy Excel support
legacy_data = await extract_text("old_data.xls")

🎯 PowerPoint Tools

Presentation content extraction (.pptx, .ppt):

# Extract slide content
slides = await extract_text("presentation.pptx", preserve_formatting=True)

# Get presentation metadata
info = await extract_metadata("slideshow.pptx")

🔧 Real-World Use Cases

Business Intelligence & Reporting

# Process quarterly reports across formats
word_summary = await extract_text("quarterly-report.docx")
excel_data = await extract_text("financial-data.xlsx", preserve_formatting=True)
ppt_insights = await extract_text("presentation.pptx")

# Cross-format health analysis
health_check = await analyze_document_health("legacy-report.doc")

Document Migration & Modernization

# Legacy document processing
legacy_docs = ["policy.doc", "procedures.xls", "training.ppt"]

for doc in legacy_docs:
    # Format detection
    format_info = await detect_office_format(doc)
    
    # Health assessment
    health = await analyze_document_health(doc)
    
    # Content extraction
    content = await extract_text(doc)

Content Analysis & Extraction

# Multi-format content processing
documents = ["research.docx", "data.xlsx", "slides.pptx"]

for doc in documents:
    # Comprehensive analysis
    text = await extract_text(doc, preserve_formatting=True)
    images = await extract_images(doc, min_width=200, min_height=200)
    metadata = await extract_metadata(doc)

🏗️ Architecture

Multi-Library Approach

MCP Office Tools uses multiple libraries with intelligent fallbacks:

Word Documents:

  • python-docxmammothdocx2txtolefile (legacy)

Excel Spreadsheets:

  • openpyxlpandasxlrd (legacy)

PowerPoint Presentations:

  • python-pptxolefile (legacy)

Format Support Matrix

Format Text Images Metadata Legacy
.docx N/A
.doc ⚠️ ⚠️
.xlsx N/A
.xls ⚠️ ⚠️
.pptx N/A
.ppt ⚠️ ⚠️ ⚠️
.csv N/A ⚠️ N/A

Full support, ⚠️ Basic support, N/A Not applicable

🔍 Advanced Features

URL Processing

Process Office documents directly from URLs:

# Direct URL processing
url_doc = "https://example.com/document.docx"
content = await extract_text(url_doc)

# Automatic caching (1-hour default)
cached_content = await extract_text(url_doc)  # Uses cache

Format Detection

Intelligent format detection and validation:

# Comprehensive format analysis
format_info = await detect_office_format("unknown_file.office")

# Returns:
# - Format name and category
# - MIME type validation
# - Legacy vs modern classification
# - Processing recommendations

Document Health Analysis

Comprehensive document integrity checking:

# Health assessment
health = await analyze_document_health("suspicious_file.docx")

# Returns:
# - Health score (1-10)
# - Validation results
# - Corruption detection
# - Processing recommendations

📈 Performance & Compatibility

System Requirements

  • Python: 3.11+
  • Memory: 512MB+ available RAM
  • Storage: 100MB+ for dependencies

Dependencies

  • Core: FastMCP, python-docx, openpyxl, python-pptx
  • Legacy: olefile, xlrd, msoffcrypto-tool
  • Enhancement: mammoth, pandas, Pillow

Platform Support

  • Linux (Ubuntu 20.04+, RHEL 8+)
  • macOS (10.15+)
  • Windows (10/11)
  • Docker containers

🛠️ Development

Setup Development Environment

# Clone repository
git clone https://github.com/mcp-office-tools/mcp-office-tools.git
cd mcp-office-tools

# Install with development dependencies
uv sync --dev

# Run tests
uv run pytest

# Code quality checks
uv run black src/ tests/
uv run ruff check src/ tests/
uv run mypy src/

Testing

# Run all tests
uv run pytest

# Run with coverage
uv run pytest --cov=mcp_office_tools

# Test specific format
uv run pytest tests/test_word_extraction.py

🤝 Integration with MCP PDF Tools

MCP Office Tools is designed as a perfect companion to MCP PDF Tools:

# Unified document processing workflow
pdf_content = await pdf_tools.extract_text("document.pdf")
docx_content = await office_tools.extract_text("document.docx")

# Cross-format analysis
pdf_metadata = await pdf_tools.extract_metadata("document.pdf")
docx_metadata = await office_tools.extract_metadata("document.docx")

📋 Supported Formats

# Get all supported formats
formats = await get_supported_formats()

# Returns comprehensive format information:
# - 15+ file extensions
# - MIME type mappings
# - Category classifications
# - Processing capabilities

🔒 Security & Privacy

  • No data collection: Documents processed locally
  • Temporary files: Automatic cleanup after processing
  • URL validation: Secure HTTPS-only downloads
  • Memory management: Efficient processing of large files

📝 License

MIT License - see LICENSE file for details.

🚀 Coming Soon

  • Advanced Excel Tools: Formula parsing, chart extraction
  • PowerPoint Enhancement: Animation analysis, slide comparison
  • Document Conversion: Cross-format conversion capabilities
  • Batch Processing: Multi-document workflows
  • Cloud Integration: Direct cloud storage support

Built with ❤️ for the MCP ecosystem

MCP Office Tools - Comprehensive Microsoft Office document processing for modern AI workflows.

Description
Comprehensive Microsoft Office document processing server for MCP (Model Context Protocol) - Word, Excel, PowerPoint support with intelligent fallback systems
Readme MIT 425 KiB
Languages
Python 100%