Improve README tone and clarity
Some checks are pending
Test Dashboard / test-and-dashboard (push) Waiting to run
Some checks are pending
Test Dashboard / test-and-dashboard (push) Waiting to run
- Replace generic opener with direct description - Make feature bullets more conversational (less "feature list" mode) - Add context before format support table - Clarify pagination example with "three ways" structure - Lead testing section with the dashboard hook - Add architecture design rationale - Remove "comprehensive" and "intelligent" buzzwords
This commit is contained in:
parent
036160d029
commit
f159efab2c
56
README.md
56
README.md
@ -2,14 +2,14 @@
|
||||
|
||||
# 📊 MCP Office Tools
|
||||
|
||||
**Comprehensive Microsoft Office document processing for AI agents**
|
||||
**MCP server for extracting text, tables, images, and data from Microsoft Office files**
|
||||
|
||||
[](https://www.python.org/downloads/)
|
||||
[](https://gofastmcp.com)
|
||||
[](https://opensource.org/licenses/MIT)
|
||||
[](https://modelcontextprotocol.io)
|
||||
|
||||
*Extract text, tables, images, formulas, and metadata from Word, Excel, PowerPoint, and CSV files*
|
||||
*Word, Excel, PowerPoint, CSV — all the formats your AI agent needs to read but can't*
|
||||
|
||||
[Installation](#-installation) • [Tools](#-available-tools) • [Examples](#-usage-examples) • [Testing](#-testing)
|
||||
|
||||
@ -19,12 +19,12 @@
|
||||
|
||||
## ✨ Features
|
||||
|
||||
- **Universal extraction** - Text, images, and metadata from any Office format
|
||||
- **Format-specific tools** - Deep analysis for Word, Excel, and PowerPoint
|
||||
- **Intelligent pagination** - Large documents automatically chunked for AI context limits
|
||||
- **Multi-library fallbacks** - Never fails silently; tries multiple extraction methods
|
||||
- **URL support** - Process documents directly from HTTP/HTTPS URLs with caching
|
||||
- **Legacy format support** - Handles .doc, .xls, .ppt from Office 97-2003
|
||||
- **Universal extraction** — Pull text, images, and metadata from any Office format
|
||||
- **Format-specific tools** — Deep analysis for Word (tables, structure), Excel (formulas, charts), PowerPoint
|
||||
- **Automatic pagination** — Large documents get chunked so they don't blow up your context window
|
||||
- **Fallback processing** — When one library chokes on a weird file, we try another. No silent failures.
|
||||
- **URL support** — Pass a URL instead of a file path; we'll download and cache it
|
||||
- **Legacy formats** — Yes, even those .doc and .xls files from 2003 still work
|
||||
|
||||
---
|
||||
|
||||
@ -96,6 +96,8 @@ claude mcp add office-tools "uvx mcp-office-tools"
|
||||
|
||||
## 📋 Format Support
|
||||
|
||||
Here's what works and what's "good enough" — legacy formats from Office 97-2003 have more limited extraction, but they still work:
|
||||
|
||||
| Format | Extension | Text | Images | Metadata | Tables | Formulas |
|
||||
|--------|-----------|:----:|:------:|:--------:|:------:|:--------:|
|
||||
| **Word (Modern)** | `.docx` | ✅ | ✅ | ✅ | ✅ | - |
|
||||
@ -134,28 +136,22 @@ result = await extract_text(
|
||||
|
||||
### Convert Word to Markdown (with Pagination)
|
||||
|
||||
```python
|
||||
# For large documents, results are automatically paginated
|
||||
result = await convert_to_markdown("big-manual.docx")
|
||||
Large documents get paginated automatically. Three ways to handle it:
|
||||
|
||||
# Continue with cursor for next page
|
||||
```python
|
||||
# Option 1: Follow the cursor for each chunk
|
||||
result = await convert_to_markdown("big-manual.docx")
|
||||
if result.get("pagination", {}).get("has_more"):
|
||||
next_page = await convert_to_markdown(
|
||||
"big-manual.docx",
|
||||
cursor_id=result["pagination"]["cursor_id"]
|
||||
)
|
||||
|
||||
# Or use page ranges to get specific sections
|
||||
result = await convert_to_markdown(
|
||||
"big-manual.docx",
|
||||
page_range="1-10"
|
||||
)
|
||||
# Option 2: Grab specific pages
|
||||
result = await convert_to_markdown("big-manual.docx", page_range="1-10")
|
||||
|
||||
# Or extract by chapter name
|
||||
result = await convert_to_markdown(
|
||||
"big-manual.docx",
|
||||
chapter_name="Introduction"
|
||||
)
|
||||
# Option 3: Extract by chapter heading
|
||||
result = await convert_to_markdown("big-manual.docx", chapter_name="Introduction")
|
||||
```
|
||||
|
||||
### Analyze Excel Data Quality
|
||||
@ -266,29 +262,27 @@ result = await extract_text("https://example.com/report.docx")
|
||||
|
||||
## 🧪 Testing
|
||||
|
||||
The project includes a comprehensive test suite with an interactive HTML dashboard:
|
||||
We built a visual test dashboard because staring at pytest output gets old. Run `make test` and you get an HTML report with pass/fail stats, detailed I/O for each test, and expandable tracebacks when things break.
|
||||
|
||||
```bash
|
||||
# Run all tests with dashboard generation
|
||||
# Run tests and generate the dashboard
|
||||
make test
|
||||
|
||||
# Run just pytest
|
||||
# Just pytest, no dashboard
|
||||
make test-pytest
|
||||
|
||||
# View the test dashboard
|
||||
# Open existing dashboard
|
||||
make view-dashboard
|
||||
```
|
||||
|
||||
The test dashboard shows:
|
||||
- Pass/fail statistics with MS Office-themed styling
|
||||
- Detailed inputs and outputs for each test
|
||||
- Expandable error tracebacks for failures
|
||||
- Category breakdown (Word, Excel, PowerPoint)
|
||||
The dashboard has an MS Office-inspired theme (Word blue, Excel green, PowerPoint orange) and groups tests by category so you can see what's working at a glance.
|
||||
|
||||
---
|
||||
|
||||
## 🏗 Architecture
|
||||
|
||||
The mixin pattern keeps things modular — universal tools work on everything, format-specific tools go deeper. When the primary library can't handle something (corrupted files, weird formatting), we fall back to alternatives.
|
||||
|
||||
```
|
||||
mcp-office-tools/
|
||||
├── src/mcp_office_tools/
|
||||
|
||||
Loading…
x
Reference in New Issue
Block a user