Improve README tone and clarity

- Replace generic opener with direct description - Make feature bullets more conversational (less "feature list" mode) - Add context before format support table - Clarify pagination example with "three ways" structure - Lead testing section with the dashboard hook - Add architecture design rationale - Remove "comprehensive" and "intelligent" buzzwords
2026-01-11 00:49:34 -07:00 · 2026-01-11 00:49:34 -07:00 · f159efab2c
commit f159efab2c
parent 036160d029
1 changed files with 25 additions and 31 deletions
--- a/README.md
+++ b/README.md
@ -2,14 +2,14 @@

 # 📊 MCP Office Tools

-**Comprehensive Microsoft Office document processing for AI agents**
+**MCP server for extracting text, tables, images, and data from Microsoft Office files**

 [![Python 3.11+](https://img.shields.io/badge/python-3.11+-blue.svg?style=flat-square)](https://www.python.org/downloads/)
 [![FastMCP](https://img.shields.io/badge/FastMCP-0.5+-green.svg?style=flat-square)](https://gofastmcp.com)
 [![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg?style=flat-square)](https://opensource.org/licenses/MIT)
 [![MCP Protocol](https://img.shields.io/badge/MCP-Protocol-purple?style=flat-square)](https://modelcontextprotocol.io)

-*Extract text, tables, images, formulas, and metadata from Word, Excel, PowerPoint, and CSV files*
+*Word, Excel, PowerPoint, CSV — all the formats your AI agent needs to read but can't*

 [Installation](#-installation) • [Tools](#-available-tools) • [Examples](#-usage-examples) • [Testing](#-testing)

@ -19,12 +19,12 @@

 ## ✨ Features

- **Universal extraction** - Text, images, and metadata from any Office format
- **Format-specific tools** - Deep analysis for Word, Excel, and PowerPoint
- **Intelligent pagination** - Large documents automatically chunked for AI context limits
- **Multi-library fallbacks** - Never fails silently; tries multiple extraction methods
- **URL support** - Process documents directly from HTTP/HTTPS URLs with caching
- **Legacy format support** - Handles .doc, .xls, .ppt from Office 97-2003
+- **Universal extraction** — Pull text, images, and metadata from any Office format
+- **Format-specific tools** — Deep analysis for Word (tables, structure), Excel (formulas, charts), PowerPoint
+- **Automatic pagination** — Large documents get chunked so they don't blow up your context window
+- **Fallback processing** — When one library chokes on a weird file, we try another. No silent failures.
+- **URL support** — Pass a URL instead of a file path; we'll download and cache it
+- **Legacy formats** — Yes, even those .doc and .xls files from 2003 still work

 ---

@ -96,6 +96,8 @@ claude mcp add office-tools "uvx mcp-office-tools"

 ## 📋 Format Support

+Here's what works and what's "good enough" — legacy formats from Office 97-2003 have more limited extraction, but they still work:
+
 | Format | Extension | Text | Images | Metadata | Tables | Formulas |
 |--------|-----------|:----:|:------:|:--------:|:------:|:--------:|
 | **Word (Modern)** | `.docx` | ✅ | ✅ | ✅ | ✅ | - |
@ -134,28 +136,22 @@ result = await extract_text(

 ### Convert Word to Markdown (with Pagination)

-```python
-# For large documents, results are automatically paginated
-result = await convert_to_markdown("big-manual.docx")
+Large documents get paginated automatically. Three ways to handle it:

-# Continue with cursor for next page
+```python
+# Option 1: Follow the cursor for each chunk
+result = await convert_to_markdown("big-manual.docx")
 if result.get("pagination", {}).get("has_more"):
    next_page = await convert_to_markdown(
        "big-manual.docx",
        cursor_id=result["pagination"]["cursor_id"]
    )

-# Or use page ranges to get specific sections
-result = await convert_to_markdown(
-    "big-manual.docx",
-    page_range="1-10"
-)
+# Option 2: Grab specific pages
+result = await convert_to_markdown("big-manual.docx", page_range="1-10")

-# Or extract by chapter name
-result = await convert_to_markdown(
-    "big-manual.docx",
-    chapter_name="Introduction"
-)
+# Option 3: Extract by chapter heading
+result = await convert_to_markdown("big-manual.docx", chapter_name="Introduction")
 ```

 ### Analyze Excel Data Quality
@ -266,29 +262,27 @@ result = await extract_text("https://example.com/report.docx")

 ## 🧪 Testing

-The project includes a comprehensive test suite with an interactive HTML dashboard:
+We built a visual test dashboard because staring at pytest output gets old. Run `make test` and you get an HTML report with pass/fail stats, detailed I/O for each test, and expandable tracebacks when things break.

 ```bash
-# Run all tests with dashboard generation
+# Run tests and generate the dashboard
 make test

-# Run just pytest
+# Just pytest, no dashboard
 make test-pytest

-# View the test dashboard
+# Open existing dashboard
 make view-dashboard
 ```

-The test dashboard shows:
- Pass/fail statistics with MS Office-themed styling
- Detailed inputs and outputs for each test
- Expandable error tracebacks for failures
- Category breakdown (Word, Excel, PowerPoint)
+The dashboard has an MS Office-inspired theme (Word blue, Excel green, PowerPoint orange) and groups tests by category so you can see what's working at a glance.

 ---

 ## 🏗 Architecture

+The mixin pattern keeps things modular — universal tools work on everything, format-specific tools go deeper. When the primary library can't handle something (corrupted files, weird formatting), we fall back to alternatives.
+
 ```
 mcp-office-tools/
 ├── src/mcp_office_tools/