Improve README tone and clarity
Some checks are pending
Test Dashboard / test-and-dashboard (push) Waiting to run

- Replace generic opener with direct description
- Make feature bullets more conversational (less "feature list" mode)
- Add context before format support table
- Clarify pagination example with "three ways" structure
- Lead testing section with the dashboard hook
- Add architecture design rationale
- Remove "comprehensive" and "intelligent" buzzwords
This commit is contained in:
Ryan Malloy 2026-01-11 00:49:34 -07:00
parent 036160d029
commit f159efab2c

View File

@ -2,14 +2,14 @@
# 📊 MCP Office Tools
**Comprehensive Microsoft Office document processing for AI agents**
**MCP server for extracting text, tables, images, and data from Microsoft Office files**
[![Python 3.11+](https://img.shields.io/badge/python-3.11+-blue.svg?style=flat-square)](https://www.python.org/downloads/)
[![FastMCP](https://img.shields.io/badge/FastMCP-0.5+-green.svg?style=flat-square)](https://gofastmcp.com)
[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg?style=flat-square)](https://opensource.org/licenses/MIT)
[![MCP Protocol](https://img.shields.io/badge/MCP-Protocol-purple?style=flat-square)](https://modelcontextprotocol.io)
*Extract text, tables, images, formulas, and metadata from Word, Excel, PowerPoint, and CSV files*
*Word, Excel, PowerPoint, CSV — all the formats your AI agent needs to read but can't*
[Installation](#-installation) • [Tools](#-available-tools) • [Examples](#-usage-examples) • [Testing](#-testing)
@ -19,12 +19,12 @@
## ✨ Features
- **Universal extraction** - Text, images, and metadata from any Office format
- **Format-specific tools** - Deep analysis for Word, Excel, and PowerPoint
- **Intelligent pagination** - Large documents automatically chunked for AI context limits
- **Multi-library fallbacks** - Never fails silently; tries multiple extraction methods
- **URL support** - Process documents directly from HTTP/HTTPS URLs with caching
- **Legacy format support** - Handles .doc, .xls, .ppt from Office 97-2003
- **Universal extraction** — Pull text, images, and metadata from any Office format
- **Format-specific tools** — Deep analysis for Word (tables, structure), Excel (formulas, charts), PowerPoint
- **Automatic pagination** — Large documents get chunked so they don't blow up your context window
- **Fallback processing** — When one library chokes on a weird file, we try another. No silent failures.
- **URL support** — Pass a URL instead of a file path; we'll download and cache it
- **Legacy formats** — Yes, even those .doc and .xls files from 2003 still work
---
@ -96,6 +96,8 @@ claude mcp add office-tools "uvx mcp-office-tools"
## 📋 Format Support
Here's what works and what's "good enough" — legacy formats from Office 97-2003 have more limited extraction, but they still work:
| Format | Extension | Text | Images | Metadata | Tables | Formulas |
|--------|-----------|:----:|:------:|:--------:|:------:|:--------:|
| **Word (Modern)** | `.docx` | ✅ | ✅ | ✅ | ✅ | - |
@ -134,28 +136,22 @@ result = await extract_text(
### Convert Word to Markdown (with Pagination)
```python
# For large documents, results are automatically paginated
result = await convert_to_markdown("big-manual.docx")
Large documents get paginated automatically. Three ways to handle it:
# Continue with cursor for next page
```python
# Option 1: Follow the cursor for each chunk
result = await convert_to_markdown("big-manual.docx")
if result.get("pagination", {}).get("has_more"):
next_page = await convert_to_markdown(
"big-manual.docx",
cursor_id=result["pagination"]["cursor_id"]
)
# Or use page ranges to get specific sections
result = await convert_to_markdown(
"big-manual.docx",
page_range="1-10"
)
# Option 2: Grab specific pages
result = await convert_to_markdown("big-manual.docx", page_range="1-10")
# Or extract by chapter name
result = await convert_to_markdown(
"big-manual.docx",
chapter_name="Introduction"
)
# Option 3: Extract by chapter heading
result = await convert_to_markdown("big-manual.docx", chapter_name="Introduction")
```
### Analyze Excel Data Quality
@ -266,29 +262,27 @@ result = await extract_text("https://example.com/report.docx")
## 🧪 Testing
The project includes a comprehensive test suite with an interactive HTML dashboard:
We built a visual test dashboard because staring at pytest output gets old. Run `make test` and you get an HTML report with pass/fail stats, detailed I/O for each test, and expandable tracebacks when things break.
```bash
# Run all tests with dashboard generation
# Run tests and generate the dashboard
make test
# Run just pytest
# Just pytest, no dashboard
make test-pytest
# View the test dashboard
# Open existing dashboard
make view-dashboard
```
The test dashboard shows:
- Pass/fail statistics with MS Office-themed styling
- Detailed inputs and outputs for each test
- Expandable error tracebacks for failures
- Category breakdown (Word, Excel, PowerPoint)
The dashboard has an MS Office-inspired theme (Word blue, Excel green, PowerPoint orange) and groups tests by category so you can see what's working at a glance.
---
## 🏗 Architecture
The mixin pattern keeps things modular — universal tools work on everything, format-specific tools go deeper. When the primary library can't handle something (corrupted files, weird formatting), we fall back to alternatives.
```
mcp-office-tools/
├── src/mcp_office_tools/