Update README with accurate tool documentation
Some checks are pending
Test Dashboard / test-and-dashboard (push) Waiting to run
Some checks are pending
Test Dashboard / test-and-dashboard (push) Waiting to run
- Document all 12 actual MCP tools (6 universal, 3 Word, 3 Excel) - Add comprehensive format support matrix with feature breakdown - Include practical usage examples with real output structures - Add test dashboard section - Simplify installation with uvx/Claude Code instructions - Remove marketing fluff; focus on technical accuracy
This commit is contained in:
parent
c935cec7b6
commit
036160d029
748
README.md
748
README.md
@ -2,494 +2,380 @@
|
|||||||
|
|
||||||
# 📊 MCP Office Tools
|
# 📊 MCP Office Tools
|
||||||
|
|
||||||
<img src="https://img.shields.io/badge/MCP-Office%20Tools-blue?style=for-the-badge&logo=microsoft-office" alt="MCP Office Tools">
|
**Comprehensive Microsoft Office document processing for AI agents**
|
||||||
|
|
||||||
**🚀 The Ultimate Microsoft Office Document Processing Powerhouse for AI**
|
|
||||||
|
|
||||||
*Transform any Office document into actionable intelligence with blazing-fast, AI-ready processing*
|
|
||||||
|
|
||||||
[](https://www.python.org/downloads/)
|
[](https://www.python.org/downloads/)
|
||||||
[](https://github.com/jlowin/fastmcp)
|
[](https://gofastmcp.com)
|
||||||
[](https://opensource.org/licenses/MIT)
|
[](https://opensource.org/licenses/MIT)
|
||||||
[](https://github.com/MCP/mcp-office-tools)
|
[](https://modelcontextprotocol.io)
|
||||||
[](https://modelcontextprotocol.io)
|
|
||||||
|
*Extract text, tables, images, formulas, and metadata from Word, Excel, PowerPoint, and CSV files*
|
||||||
|
|
||||||
|
[Installation](#-installation) • [Tools](#-available-tools) • [Examples](#-usage-examples) • [Testing](#-testing)
|
||||||
|
|
||||||
</div>
|
</div>
|
||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
## ✨ **What Makes MCP Office Tools Special?**
|
## ✨ Features
|
||||||
|
|
||||||
> 🎯 **The Problem**: Office documents are data goldmines, but extracting intelligence from them is painful, unreliable, and slow.
|
- **Universal extraction** - Text, images, and metadata from any Office format
|
||||||
>
|
- **Format-specific tools** - Deep analysis for Word, Excel, and PowerPoint
|
||||||
> ⚡ **The Solution**: MCP Office Tools delivers **lightning-fast, AI-optimized document processing** with **zero configuration** and **bulletproof reliability**.
|
- **Intelligent pagination** - Large documents automatically chunked for AI context limits
|
||||||
|
- **Multi-library fallbacks** - Never fails silently; tries multiple extraction methods
|
||||||
<table>
|
- **URL support** - Process documents directly from HTTP/HTTPS URLs with caching
|
||||||
<tr>
|
- **Legacy format support** - Handles .doc, .xls, .ppt from Office 97-2003
|
||||||
<td>
|
|
||||||
|
|
||||||
### 🏆 **Why Choose Us?**
|
|
||||||
- **🚀 6x Faster** than traditional tools
|
|
||||||
- **🎯 99.9% Accuracy** with multi-library fallbacks
|
|
||||||
- **🔄 15+ Formats** including legacy Office files
|
|
||||||
- **🧠 AI-Ready** structured data extraction
|
|
||||||
- **⚡ Zero Setup** - works out of the box
|
|
||||||
- **🌐 URL Support** with smart caching
|
|
||||||
|
|
||||||
</td>
|
|
||||||
<td>
|
|
||||||
|
|
||||||
### 📈 **Perfect For:**
|
|
||||||
- **Business Intelligence** dashboards
|
|
||||||
- **Document Migration** projects
|
|
||||||
- **Content Analysis** pipelines
|
|
||||||
- **AI Training** data preparation
|
|
||||||
- **Compliance** and auditing
|
|
||||||
- **Research** and academia
|
|
||||||
|
|
||||||
</td>
|
|
||||||
</tr>
|
|
||||||
</table>
|
|
||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
## 🚀 **Get Started in 30 Seconds**
|
## 🚀 Installation
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
# 1️⃣ Install (choose your favorite)
|
# Quick install with uvx (recommended)
|
||||||
|
uvx mcp-office-tools
|
||||||
|
|
||||||
|
# Or install with uv/pip
|
||||||
uv add mcp-office-tools
|
uv add mcp-office-tools
|
||||||
# or: pip install mcp-office-tools
|
pip install mcp-office-tools
|
||||||
|
|
||||||
# 2️⃣ Run the server
|
|
||||||
mcp-office-tools
|
|
||||||
|
|
||||||
# 3️⃣ Process documents instantly!
|
|
||||||
# (Works with Claude Desktop, API calls, or any MCP client)
|
|
||||||
```
|
```
|
||||||
|
|
||||||
<details>
|
### Claude Desktop Configuration
|
||||||
<summary>🔧 <b>Claude Desktop Setup</b> (click to expand)</summary>
|
|
||||||
|
Add to your `claude_desktop_config.json`:
|
||||||
|
|
||||||
Add this to your `claude_desktop_config.json`:
|
|
||||||
```json
|
```json
|
||||||
{
|
{
|
||||||
"mcpServers": {
|
"mcpServers": {
|
||||||
"mcp-office-tools": {
|
"office-tools": {
|
||||||
"command": "mcp-office-tools"
|
"command": "uvx",
|
||||||
|
"args": ["mcp-office-tools"]
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
```
|
```
|
||||||
*Restart Claude Desktop and you're ready to process Office documents!*
|
|
||||||
|
|
||||||
</details>
|
### Claude Code Configuration
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## 🎭 **See It In Action**
|
|
||||||
|
|
||||||
### **📝 Word Documents → Structured Intelligence**
|
|
||||||
```python
|
|
||||||
# Extract everything from a Word document
|
|
||||||
result = await extract_text("quarterly-report.docx", preserve_formatting=True)
|
|
||||||
|
|
||||||
# Get instant insights
|
|
||||||
{
|
|
||||||
"text": "Q4 revenue increased by 23%...",
|
|
||||||
"word_count": 2847,
|
|
||||||
"character_count": 15920,
|
|
||||||
"extraction_time": 0.3,
|
|
||||||
"method_used": "python-docx",
|
|
||||||
"formatted_sections": [
|
|
||||||
{"type": "heading", "text": "Executive Summary", "level": 1},
|
|
||||||
{"type": "paragraph", "text": "Our Q4 performance exceeded expectations..."}
|
|
||||||
]
|
|
||||||
}
|
|
||||||
```
|
|
||||||
|
|
||||||
### **📊 Excel Spreadsheets → Pure Data Gold**
|
|
||||||
```python
|
|
||||||
# Process complex Excel files with ease
|
|
||||||
data = await extract_text("financial-model.xlsx", preserve_formatting=True)
|
|
||||||
|
|
||||||
# Returns clean, structured data ready for AI analysis
|
|
||||||
{
|
|
||||||
"text": "Revenue\t$2.4M\t$2.8M\t$3.1M\nExpenses\t$1.8M\t$1.9M\t$2.0M",
|
|
||||||
"method_used": "openpyxl",
|
|
||||||
"formatted_sections": [
|
|
||||||
{
|
|
||||||
"type": "worksheet",
|
|
||||||
"name": "Q4 Summary",
|
|
||||||
"data": [["Revenue", 2400000, 2800000, 3100000]]
|
|
||||||
}
|
|
||||||
]
|
|
||||||
}
|
|
||||||
```
|
|
||||||
|
|
||||||
### **🎯 PowerPoint → Key Insights Extracted**
|
|
||||||
```python
|
|
||||||
# Turn presentations into actionable content
|
|
||||||
slides = await extract_text("strategy-deck.pptx", preserve_formatting=True)
|
|
||||||
|
|
||||||
# Get slide-by-slide breakdown
|
|
||||||
{
|
|
||||||
"text": "Slide 1: Market Opportunity\nSlide 2: Competitive Analysis...",
|
|
||||||
"formatted_sections": [
|
|
||||||
{"type": "slide", "number": 1, "text": "Market Opportunity\n$50B TAM..."},
|
|
||||||
{"type": "slide", "number": 2, "text": "Competitive Analysis\nWe lead in..."}
|
|
||||||
]
|
|
||||||
}
|
|
||||||
```
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## 🛠️ **Comprehensive Toolkit**
|
|
||||||
|
|
||||||
<div align="center">
|
|
||||||
|
|
||||||
| 🔧 **Tool** | 📋 **Purpose** | ⚡ **Speed** | 🎯 **Accuracy** |
|
|
||||||
|-------------|---------------|-------------|----------------|
|
|
||||||
| `extract_text` | Pull all text content with formatting | **Ultra Fast** | 99.9% |
|
|
||||||
| `extract_images` | Extract embedded images & media | **Fast** | 99% |
|
|
||||||
| `extract_metadata` | Document properties & statistics | **Instant** | 100% |
|
|
||||||
| `detect_office_format` | Smart format detection & validation | **Instant** | 100% |
|
|
||||||
| `analyze_document_health` | File integrity & corruption analysis | **Fast** | 98% |
|
|
||||||
| `get_supported_formats` | List all supported file types | **Instant** | 100% |
|
|
||||||
|
|
||||||
</div>
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## 🌟 **Format Support Matrix**
|
|
||||||
|
|
||||||
<div align="center">
|
|
||||||
|
|
||||||
### **🎯 Universal Support Across All Office Formats**
|
|
||||||
|
|
||||||
| 📄 **Format** | 📝 **Text** | 🖼️ **Images** | 🏷️ **Metadata** | 🕰️ **Legacy** | 💪 **Status** |
|
|
||||||
|---------------|-------------|---------------|-----------------|---------------|----------------|
|
|
||||||
| `.docx` | ✅ Perfect | ✅ Perfect | ✅ Perfect | N/A | 🟢 **Production** |
|
|
||||||
| `.doc` | ✅ Excellent | ⚠️ Basic | ⚠️ Basic | ✅ Full | 🟢 **Production** |
|
|
||||||
| `.xlsx` | ✅ Perfect | ✅ Perfect | ✅ Perfect | N/A | 🟢 **Production** |
|
|
||||||
| `.xls` | ✅ Excellent | ⚠️ Basic | ⚠️ Basic | ✅ Full | 🟢 **Production** |
|
|
||||||
| `.pptx` | ✅ Perfect | ✅ Perfect | ✅ Perfect | N/A | 🟢 **Production** |
|
|
||||||
| `.ppt` | ✅ Good | ⚠️ Basic | ⚠️ Basic | ✅ Full | 🟡 **Stable** |
|
|
||||||
| `.csv` | ✅ Perfect | N/A | ⚠️ Basic | N/A | 🟢 **Production** |
|
|
||||||
|
|
||||||
*✅ Perfect • ⚠️ Basic • 🟢 Production Ready • 🟡 Stable*
|
|
||||||
|
|
||||||
</div>
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## ⚡ **Blazing Fast Performance**
|
|
||||||
|
|
||||||
<div align="center">
|
|
||||||
|
|
||||||
### **📊 Real-World Benchmarks**
|
|
||||||
|
|
||||||
| 📄 **Document Type** | 📏 **Size** | ⏱️ **Processing Time** | 🚀 **Speed vs Competitors** |
|
|
||||||
|---------------------|------------|----------------------|---------------------------|
|
|
||||||
| Word Document | 50 pages | 0.3 seconds | **6x faster** |
|
|
||||||
| Excel Spreadsheet | 10 sheets | 0.8 seconds | **4x faster** |
|
|
||||||
| PowerPoint Deck | 25 slides | 0.5 seconds | **5x faster** |
|
|
||||||
| Legacy .doc | 100 pages | 1.2 seconds | **3x faster** |
|
|
||||||
|
|
||||||
*Benchmarked on: MacBook Pro M2, 16GB RAM*
|
|
||||||
|
|
||||||
</div>
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## 🏗️ **Rock-Solid Architecture**
|
|
||||||
|
|
||||||
### **🔄 Multi-Library Fallback System**
|
|
||||||
*Never worry about document compatibility again*
|
|
||||||
|
|
||||||
```mermaid
|
|
||||||
graph TD
|
|
||||||
A[Document Input] --> B{Format Detection}
|
|
||||||
B -->|.docx| C[python-docx]
|
|
||||||
B -->|.doc| D[olefile]
|
|
||||||
B -->|.xlsx| E[openpyxl]
|
|
||||||
B -->|.xls| F[xlrd]
|
|
||||||
B -->|.pptx| G[python-pptx]
|
|
||||||
|
|
||||||
C -->|Success| H[✅ Extract Content]
|
|
||||||
C -->|Fail| I[mammoth fallback]
|
|
||||||
I -->|Fail| J[docx2txt fallback]
|
|
||||||
|
|
||||||
E -->|Success| H
|
|
||||||
E -->|Fail| K[pandas fallback]
|
|
||||||
|
|
||||||
G -->|Success| H
|
|
||||||
G -->|Fail| L[olefile fallback]
|
|
||||||
|
|
||||||
H --> M[🎯 Structured Output]
|
|
||||||
```
|
|
||||||
|
|
||||||
### **🧠 Intelligent Processing Pipeline**
|
|
||||||
|
|
||||||
1. **🔍 Smart Detection**: Automatically identify document type and best processing method
|
|
||||||
2. **⚡ Optimized Extraction**: Use the fastest, most accurate library for each format
|
|
||||||
3. **🛡️ Fallback Protection**: If primary method fails, seamlessly switch to backup
|
|
||||||
4. **🧹 Clean Output**: Deliver perfectly structured, AI-ready data every time
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## 🌍 **Real-World Success Stories**
|
|
||||||
|
|
||||||
<div align="center">
|
|
||||||
|
|
||||||
### **🏢 Enterprise Use Cases**
|
|
||||||
|
|
||||||
</div>
|
|
||||||
|
|
||||||
<table>
|
|
||||||
<tr>
|
|
||||||
<td>
|
|
||||||
|
|
||||||
### **📊 Business Intelligence**
|
|
||||||
*Fortune 500 Financial Services*
|
|
||||||
|
|
||||||
**Challenge**: Process 10,000+ financial reports monthly
|
|
||||||
|
|
||||||
**Result**:
|
|
||||||
- ⚡ **95% time reduction** (20 hours → 1 hour)
|
|
||||||
- 🎯 **99.9% accuracy** in data extraction
|
|
||||||
- 💰 **$2M annual savings** in manual processing
|
|
||||||
|
|
||||||
</td>
|
|
||||||
<td>
|
|
||||||
|
|
||||||
### **🔄 Document Migration**
|
|
||||||
*Global Healthcare Provider*
|
|
||||||
|
|
||||||
**Challenge**: Migrate 50,000 legacy .doc files
|
|
||||||
|
|
||||||
**Result**:
|
|
||||||
- 📈 **100% success rate** with legacy formats
|
|
||||||
- ⏱️ **6 months → 2 weeks** completion time
|
|
||||||
- 🛡️ **Zero data loss** during migration
|
|
||||||
|
|
||||||
</td>
|
|
||||||
</tr>
|
|
||||||
<tr>
|
|
||||||
<td>
|
|
||||||
|
|
||||||
### **🔬 Research Analytics**
|
|
||||||
*Top University Medical School*
|
|
||||||
|
|
||||||
**Challenge**: Analyze 5,000 research papers
|
|
||||||
|
|
||||||
**Result**:
|
|
||||||
- 🚀 **10x faster** literature analysis
|
|
||||||
- 📋 **Structured data** ready for ML models
|
|
||||||
- 🎓 **3 published papers** from insights
|
|
||||||
|
|
||||||
</td>
|
|
||||||
<td>
|
|
||||||
|
|
||||||
### **🤖 AI Training Data**
|
|
||||||
*Silicon Valley AI Startup*
|
|
||||||
|
|
||||||
**Challenge**: Extract training data from documents
|
|
||||||
|
|
||||||
**Result**:
|
|
||||||
- 📊 **1M+ documents** processed flawlessly
|
|
||||||
- ⚡ **Real-time processing** pipeline
|
|
||||||
- 🧠 **40% better model accuracy**
|
|
||||||
|
|
||||||
</td>
|
|
||||||
</tr>
|
|
||||||
</table>
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## 🎯 **Advanced Features That Set Us Apart**
|
|
||||||
|
|
||||||
### **🌐 URL Processing with Smart Caching**
|
|
||||||
```python
|
|
||||||
# Process documents directly from the web
|
|
||||||
doc_url = "https://company.com/annual-report.docx"
|
|
||||||
content = await extract_text(doc_url) # Downloads & caches automatically
|
|
||||||
|
|
||||||
# Second call uses cache - blazing fast!
|
|
||||||
cached_content = await extract_text(doc_url) # < 0.01 seconds
|
|
||||||
```
|
|
||||||
|
|
||||||
### **🩺 Document Health Analysis**
|
|
||||||
```python
|
|
||||||
# Get comprehensive document health insights
|
|
||||||
health = await analyze_document_health("suspicious-file.docx")
|
|
||||||
|
|
||||||
{
|
|
||||||
"overall_health": "healthy",
|
|
||||||
"health_score": 9,
|
|
||||||
"recommendations": ["Document appears healthy and ready for processing"],
|
|
||||||
"corruption_detected": false,
|
|
||||||
"password_protected": false
|
|
||||||
}
|
|
||||||
```
|
|
||||||
|
|
||||||
### **🔍 Intelligent Format Detection**
|
|
||||||
```python
|
|
||||||
# Automatically detect and validate any Office file
|
|
||||||
format_info = await detect_office_format("mystery-document")
|
|
||||||
|
|
||||||
{
|
|
||||||
"format_name": "Word Document (DOCX)",
|
|
||||||
"category": "word",
|
|
||||||
"is_legacy": false,
|
|
||||||
"supports_macros": false,
|
|
||||||
"processing_recommendations": ["Use python-docx for optimal results"]
|
|
||||||
}
|
|
||||||
```
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## 📈 **Installation & Setup**
|
|
||||||
|
|
||||||
<details>
|
|
||||||
<summary>🚀 <b>Quick Install</b> (Recommended)</summary>
|
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
# Using uv (fastest)
|
claude mcp add office-tools "uvx mcp-office-tools"
|
||||||
uv add mcp-office-tools
|
|
||||||
|
|
||||||
# Using pip
|
|
||||||
pip install mcp-office-tools
|
|
||||||
|
|
||||||
# From source (latest features)
|
|
||||||
git clone https://git.supported.systems/MCP/mcp-office-tools.git
|
|
||||||
cd mcp-office-tools
|
|
||||||
uv sync
|
|
||||||
```
|
```
|
||||||
|
|
||||||
</details>
|
---
|
||||||
|
|
||||||
<details>
|
## 🛠 Available Tools
|
||||||
<summary>🐳 <b>Docker Setup</b></summary>
|
|
||||||
|
|
||||||
```dockerfile
|
### Universal Tools
|
||||||
FROM python:3.11-slim
|
*Work with all Office formats: Word, Excel, PowerPoint, CSV*
|
||||||
RUN pip install mcp-office-tools
|
|
||||||
CMD ["mcp-office-tools"]
|
| Tool | Description |
|
||||||
|
|------|-------------|
|
||||||
|
| `extract_text` | Extract text with optional formatting preservation |
|
||||||
|
| `extract_images` | Extract embedded images with size filtering |
|
||||||
|
| `extract_metadata` | Get document properties (author, dates, statistics) |
|
||||||
|
| `detect_office_format` | Identify format, version, encryption status |
|
||||||
|
| `analyze_document_health` | Check integrity, corruption, password protection |
|
||||||
|
| `get_supported_formats` | List all supported file extensions |
|
||||||
|
|
||||||
|
### Word Tools
|
||||||
|
|
||||||
|
| Tool | Description |
|
||||||
|
|------|-------------|
|
||||||
|
| `convert_to_markdown` | Convert to Markdown with automatic pagination for large docs |
|
||||||
|
| `extract_word_tables` | Extract tables as structured JSON, CSV, or Markdown |
|
||||||
|
| `analyze_word_structure` | Analyze headings, sections, styles, and document hierarchy |
|
||||||
|
|
||||||
|
### Excel Tools
|
||||||
|
|
||||||
|
| Tool | Description |
|
||||||
|
|------|-------------|
|
||||||
|
| `analyze_excel_data` | Statistical analysis: data types, missing values, outliers |
|
||||||
|
| `extract_excel_formulas` | Extract formulas with values and dependency analysis |
|
||||||
|
| `create_excel_chart_data` | Generate Chart.js/Plotly-ready data from spreadsheets |
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 📋 Format Support
|
||||||
|
|
||||||
|
| Format | Extension | Text | Images | Metadata | Tables | Formulas |
|
||||||
|
|--------|-----------|:----:|:------:|:--------:|:------:|:--------:|
|
||||||
|
| **Word (Modern)** | `.docx` | ✅ | ✅ | ✅ | ✅ | - |
|
||||||
|
| **Word (Legacy)** | `.doc` | ✅ | ⚠️ | ⚠️ | ⚠️ | - |
|
||||||
|
| **Word Template** | `.dotx` | ✅ | ✅ | ✅ | ✅ | - |
|
||||||
|
| **Word Macro** | `.docm` | ✅ | ✅ | ✅ | ✅ | - |
|
||||||
|
| **Excel (Modern)** | `.xlsx` | ✅ | ✅ | ✅ | ✅ | ✅ |
|
||||||
|
| **Excel (Legacy)** | `.xls` | ✅ | ⚠️ | ⚠️ | ✅ | ⚠️ |
|
||||||
|
| **Excel Template** | `.xltx` | ✅ | ✅ | ✅ | ✅ | ✅ |
|
||||||
|
| **Excel Macro** | `.xlsm` | ✅ | ✅ | ✅ | ✅ | ✅ |
|
||||||
|
| **PowerPoint (Modern)** | `.pptx` | ✅ | ✅ | ✅ | ✅ | - |
|
||||||
|
| **PowerPoint (Legacy)** | `.ppt` | ✅ | ⚠️ | ⚠️ | ⚠️ | - |
|
||||||
|
| **PowerPoint Template** | `.potx` | ✅ | ✅ | ✅ | ✅ | - |
|
||||||
|
| **CSV** | `.csv` | ✅ | - | ⚠️ | ✅ | - |
|
||||||
|
|
||||||
|
✅ Full support • ⚠️ Basic/partial support • - Not applicable
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 💡 Usage Examples
|
||||||
|
|
||||||
|
### Extract Text from Any Document
|
||||||
|
|
||||||
|
```python
|
||||||
|
# Simple extraction
|
||||||
|
result = await extract_text("report.docx")
|
||||||
|
print(result["text"])
|
||||||
|
|
||||||
|
# With formatting preserved
|
||||||
|
result = await extract_text(
|
||||||
|
file_path="report.docx",
|
||||||
|
preserve_formatting=True,
|
||||||
|
include_metadata=True
|
||||||
|
)
|
||||||
```
|
```
|
||||||
|
|
||||||
</details>
|
### Convert Word to Markdown (with Pagination)
|
||||||
|
|
||||||
<details>
|
```python
|
||||||
<summary>🔧 <b>Development Setup</b></summary>
|
# For large documents, results are automatically paginated
|
||||||
|
result = await convert_to_markdown("big-manual.docx")
|
||||||
|
|
||||||
|
# Continue with cursor for next page
|
||||||
|
if result.get("pagination", {}).get("has_more"):
|
||||||
|
next_page = await convert_to_markdown(
|
||||||
|
"big-manual.docx",
|
||||||
|
cursor_id=result["pagination"]["cursor_id"]
|
||||||
|
)
|
||||||
|
|
||||||
|
# Or use page ranges to get specific sections
|
||||||
|
result = await convert_to_markdown(
|
||||||
|
"big-manual.docx",
|
||||||
|
page_range="1-10"
|
||||||
|
)
|
||||||
|
|
||||||
|
# Or extract by chapter name
|
||||||
|
result = await convert_to_markdown(
|
||||||
|
"big-manual.docx",
|
||||||
|
chapter_name="Introduction"
|
||||||
|
)
|
||||||
|
```
|
||||||
|
|
||||||
|
### Analyze Excel Data Quality
|
||||||
|
|
||||||
|
```python
|
||||||
|
result = await analyze_excel_data(
|
||||||
|
file_path="sales-data.xlsx",
|
||||||
|
include_statistics=True,
|
||||||
|
check_data_quality=True
|
||||||
|
)
|
||||||
|
|
||||||
|
# Returns per-column analysis
|
||||||
|
# {
|
||||||
|
# "analysis": {
|
||||||
|
# "Sheet1": {
|
||||||
|
# "dimensions": {"rows": 1000, "columns": 12},
|
||||||
|
# "column_info": {
|
||||||
|
# "Revenue": {
|
||||||
|
# "data_type": "float64",
|
||||||
|
# "null_percentage": 2.3,
|
||||||
|
# "statistics": {"mean": 45000, "median": 42000, ...},
|
||||||
|
# "quality_issues": ["5 potential outliers"]
|
||||||
|
# }
|
||||||
|
# },
|
||||||
|
# "data_quality": {
|
||||||
|
# "completeness_percentage": 97.8,
|
||||||
|
# "duplicate_rows": 12
|
||||||
|
# }
|
||||||
|
# }
|
||||||
|
# }
|
||||||
|
# }
|
||||||
|
```
|
||||||
|
|
||||||
|
### Extract Excel Formulas
|
||||||
|
|
||||||
|
```python
|
||||||
|
result = await extract_excel_formulas(
|
||||||
|
file_path="financial-model.xlsx",
|
||||||
|
analyze_dependencies=True
|
||||||
|
)
|
||||||
|
|
||||||
|
# Returns formula details with dependency mapping
|
||||||
|
# {
|
||||||
|
# "formulas": {
|
||||||
|
# "Sheet1": [
|
||||||
|
# {
|
||||||
|
# "cell": "D2",
|
||||||
|
# "formula": "=B2*C2",
|
||||||
|
# "value": 1500.00,
|
||||||
|
# "dependencies": ["B2", "C2"]
|
||||||
|
# }
|
||||||
|
# ]
|
||||||
|
# }
|
||||||
|
# }
|
||||||
|
```
|
||||||
|
|
||||||
|
### Generate Chart Data
|
||||||
|
|
||||||
|
```python
|
||||||
|
result = await create_excel_chart_data(
|
||||||
|
file_path="quarterly-revenue.xlsx",
|
||||||
|
chart_type="line",
|
||||||
|
output_format="chartjs"
|
||||||
|
)
|
||||||
|
|
||||||
|
# Returns ready-to-use Chart.js configuration
|
||||||
|
# {
|
||||||
|
# "chartjs": {
|
||||||
|
# "type": "line",
|
||||||
|
# "data": {
|
||||||
|
# "labels": ["Q1", "Q2", "Q3", "Q4"],
|
||||||
|
# "datasets": [{"label": "Revenue", "data": [100, 120, 115, 140]}]
|
||||||
|
# }
|
||||||
|
# }
|
||||||
|
# }
|
||||||
|
```
|
||||||
|
|
||||||
|
### Extract Word Tables
|
||||||
|
|
||||||
|
```python
|
||||||
|
result = await extract_word_tables(
|
||||||
|
file_path="contract.docx",
|
||||||
|
output_format="markdown"
|
||||||
|
)
|
||||||
|
|
||||||
|
# Returns tables with optional format conversion
|
||||||
|
# {
|
||||||
|
# "tables": [
|
||||||
|
# {
|
||||||
|
# "table_index": 0,
|
||||||
|
# "dimensions": {"rows": 5, "columns": 3},
|
||||||
|
# "converted_output": "| Name | Role | Department |\n|---|---|---|\n..."
|
||||||
|
# }
|
||||||
|
# ]
|
||||||
|
# }
|
||||||
|
```
|
||||||
|
|
||||||
|
### Process Documents from URLs
|
||||||
|
|
||||||
|
```python
|
||||||
|
# Documents are downloaded and cached automatically
|
||||||
|
result = await extract_text("https://example.com/report.docx")
|
||||||
|
|
||||||
|
# Cache expires after 1 hour by default
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 🧪 Testing
|
||||||
|
|
||||||
|
The project includes a comprehensive test suite with an interactive HTML dashboard:
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
# Clone repository
|
# Run all tests with dashboard generation
|
||||||
git clone https://git.supported.systems/MCP/mcp-office-tools.git
|
make test
|
||||||
cd mcp-office-tools
|
|
||||||
|
|
||||||
# Install with development dependencies
|
# Run just pytest
|
||||||
|
make test-pytest
|
||||||
|
|
||||||
|
# View the test dashboard
|
||||||
|
make view-dashboard
|
||||||
|
```
|
||||||
|
|
||||||
|
The test dashboard shows:
|
||||||
|
- Pass/fail statistics with MS Office-themed styling
|
||||||
|
- Detailed inputs and outputs for each test
|
||||||
|
- Expandable error tracebacks for failures
|
||||||
|
- Category breakdown (Word, Excel, PowerPoint)
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 🏗 Architecture
|
||||||
|
|
||||||
|
```
|
||||||
|
mcp-office-tools/
|
||||||
|
├── src/mcp_office_tools/
|
||||||
|
│ ├── server.py # FastMCP server entry point
|
||||||
|
│ ├── mixins/
|
||||||
|
│ │ ├── universal.py # Format-agnostic tools
|
||||||
|
│ │ ├── word.py # Word-specific tools
|
||||||
|
│ │ ├── excel.py # Excel-specific tools
|
||||||
|
│ │ └── powerpoint.py # PowerPoint tools (WIP)
|
||||||
|
│ ├── utils/
|
||||||
|
│ │ ├── validation.py # File validation
|
||||||
|
│ │ ├── file_detection.py # Format detection
|
||||||
|
│ │ ├── caching.py # URL caching
|
||||||
|
│ │ └── decorators.py # Error handling, defaults
|
||||||
|
│ └── pagination.py # Large document pagination
|
||||||
|
├── tests/ # pytest test suite
|
||||||
|
└── reports/ # Test dashboard output
|
||||||
|
```
|
||||||
|
|
||||||
|
### Processing Libraries
|
||||||
|
|
||||||
|
| Format | Primary Library | Fallback |
|
||||||
|
|--------|----------------|----------|
|
||||||
|
| `.docx` | python-docx | mammoth |
|
||||||
|
| `.xlsx` | openpyxl | pandas |
|
||||||
|
| `.pptx` | python-pptx | - |
|
||||||
|
| `.doc`/`.xls`/`.ppt` | olefile | - |
|
||||||
|
| `.csv` | pandas | built-in csv |
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 🔧 Development
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Clone and install
|
||||||
|
git clone https://github.com/yourusername/mcp-office-tools.git
|
||||||
|
cd mcp-office-tools
|
||||||
uv sync --dev
|
uv sync --dev
|
||||||
|
|
||||||
# Run tests
|
# Run tests
|
||||||
uv run pytest
|
uv run pytest
|
||||||
|
|
||||||
# Code quality
|
# Format and lint
|
||||||
uv run black src/ tests/
|
uv run black src/ tests/
|
||||||
uv run ruff check src/ tests/
|
uv run ruff check src/ tests/
|
||||||
|
|
||||||
|
# Type check
|
||||||
uv run mypy src/
|
uv run mypy src/
|
||||||
```
|
```
|
||||||
|
|
||||||
</details>
|
---
|
||||||
|
|
||||||
|
## 📦 Dependencies
|
||||||
|
|
||||||
|
**Core:**
|
||||||
|
- `fastmcp` - MCP server framework
|
||||||
|
- `python-docx` - Word document processing
|
||||||
|
- `openpyxl` - Excel spreadsheet processing
|
||||||
|
- `python-pptx` - PowerPoint processing
|
||||||
|
- `pandas` - Data analysis and CSV handling
|
||||||
|
- `mammoth` - Word to HTML/Markdown conversion
|
||||||
|
- `olefile` - Legacy OLE format support
|
||||||
|
- `xlrd` - Legacy Excel support
|
||||||
|
- `pillow` - Image processing
|
||||||
|
- `aiohttp` / `aiofiles` - Async HTTP and file I/O
|
||||||
|
|
||||||
|
**Optional:**
|
||||||
|
- `python-magic` - Enhanced MIME type detection
|
||||||
|
- `msoffcrypto-tool` - Encrypted file detection
|
||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
## 🤝 **Integration Ecosystem**
|
## 🤝 Related Projects
|
||||||
|
|
||||||
### **🔗 Perfect Companion to MCP PDF Tools**
|
- **[MCP PDF Tools](https://github.com/yourusername/mcp-pdf-tools)** - Companion server for PDF processing
|
||||||
|
- **[FastMCP](https://gofastmcp.com)** - The framework powering this server
|
||||||
```python
|
|
||||||
# Unified document processing across ALL formats
|
|
||||||
pdf_data = await pdf_tools.extract_text("report.pdf")
|
|
||||||
word_data = await office_tools.extract_text("report.docx")
|
|
||||||
excel_data = await office_tools.extract_text("data.xlsx")
|
|
||||||
|
|
||||||
# Cross-format document analysis
|
|
||||||
comparison = await compare_documents(pdf_data, word_data, excel_data)
|
|
||||||
```
|
|
||||||
|
|
||||||
### **⚡ Works With Your Favorite Tools**
|
|
||||||
- **🤖 Claude Desktop**: Native MCP integration
|
|
||||||
- **📊 Jupyter Notebooks**: Perfect for data analysis
|
|
||||||
- **🐍 Python Scripts**: Direct API access
|
|
||||||
- **🌐 Web Apps**: REST API wrappers
|
|
||||||
- **☁️ Cloud Functions**: Serverless deployment
|
|
||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
## 🛡️ **Enterprise-Grade Security**
|
## 📜 License
|
||||||
|
|
||||||
|
MIT License - see [LICENSE](LICENSE) for details.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
<div align="center">
|
<div align="center">
|
||||||
|
|
||||||
| 🔒 **Security Feature** | ✅ **Status** | 📋 **Description** |
|
**Built with [FastMCP](https://gofastmcp.com) and the [Model Context Protocol](https://modelcontextprotocol.io)**
|
||||||
|------------------------|---------------|-------------------|
|
|
||||||
| **Local Processing** | ✅ Enabled | Documents never leave your environment |
|
|
||||||
| **Automatic Cleanup** | ✅ Enabled | Temporary files removed after processing |
|
|
||||||
| **HTTPS-Only URLs** | ✅ Enforced | Secure downloads with certificate validation |
|
|
||||||
| **Memory Management** | ✅ Optimized | Efficient handling of large files |
|
|
||||||
| **No Data Collection** | ✅ Guaranteed | Zero telemetry or tracking |
|
|
||||||
|
|
||||||
</div>
|
</div>
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## 🚀 **What's Coming Next?**
|
|
||||||
|
|
||||||
<div align="center">
|
|
||||||
|
|
||||||
### **🔮 Roadmap 2024-2025**
|
|
||||||
|
|
||||||
</div>
|
|
||||||
|
|
||||||
| 🗓️ **Timeline** | 🎯 **Feature** | 📋 **Description** |
|
|
||||||
|-----------------|---------------|-------------------|
|
|
||||||
| **Q1 2025** | **Advanced Excel Tools** | Formula parsing, chart extraction, data validation |
|
|
||||||
| **Q2 2025** | **PowerPoint Pro** | Animation analysis, slide comparison, template detection |
|
|
||||||
| **Q3 2025** | **Document Conversion** | Cross-format conversion (Word→PDF, Excel→CSV, etc.) |
|
|
||||||
| **Q4 2025** | **Batch Processing** | Multi-document workflows with progress tracking |
|
|
||||||
| **2026** | **Cloud Integration** | Direct OneDrive, Google Drive, SharePoint support |
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## 💝 **Community & Support**
|
|
||||||
|
|
||||||
<div align="center">
|
|
||||||
|
|
||||||
### **Join Our Growing Community!**
|
|
||||||
|
|
||||||
[](https://git.supported.systems/MCP/mcp-office-tools)
|
|
||||||
[](https://git.supported.systems/MCP/mcp-office-tools/issues)
|
|
||||||
[](https://git.supported.systems/MCP/mcp-office-tools/discussions)
|
|
||||||
|
|
||||||
**💬 Need Help?** Open an issue • **🐛 Found a Bug?** Report it • **💡 Have an Idea?** Share it!
|
|
||||||
|
|
||||||
</div>
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
<div align="center">
|
|
||||||
|
|
||||||
## 📜 **License & Credits**
|
|
||||||
|
|
||||||
**MIT License** - Use it anywhere, anytime, for anything!
|
|
||||||
|
|
||||||
**Built with ❤️ by the MCP Community**
|
|
||||||
|
|
||||||
*Powered by [FastMCP](https://github.com/jlowin/fastmcp) • [Model Context Protocol](https://modelcontextprotocol.io) • Modern Python*
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
### **⭐ If MCP Office Tools helps you, please star the repo! ⭐**
|
|
||||||
|
|
||||||
*It helps us build better tools for the community* 🚀
|
|
||||||
|
|
||||||
</div>
|
|
||||||
Loading…
x
Reference in New Issue
Block a user