Go to file

Test Dashboard / test-and-dashboard (push) Waiting to run

Details

Add MCP resources documentation and fix section format suffix

- Document MCP resource system in README with URI patterns, format
  suffixes, range syntax, and section detection strategies
- Add index_document to Universal Tools table
- Update architecture section to include resources.py
- Fix section:// resource to support .md/.txt/.html format suffixes
  (matching chapter:// behavior)

2026-01-11 10:23:47 -07:00

.github/workflows

Add MS Office-themed test dashboard with interactive reporting

2026-01-11 00:28:12 -07:00

docs

Add MCP resource system for embedded document content

2026-01-11 09:04:29 -07:00

examples

Initial commit: MCP Office Tools v0.1.0

2025-08-18 01:01:48 -06:00

reports

Improve section detection with heading styles + fallback

2026-01-11 09:40:38 -07:00

src/mcp_office_tools

Add MCP resources documentation and fix section format suffix

2026-01-11 10:23:47 -07:00

tests

Add MCP resource system for embedded document content

2026-01-11 09:04:29 -07:00

.gitignore

Update README and gitignore for new document tools

2026-01-11 07:41:49 -07:00

ADVANCED_TOOLS_PLAN.md

Add MS Office-themed test dashboard with interactive reporting

2026-01-11 00:28:12 -07:00

CLAUDE.md

Initial commit: MCP Office Tools v0.1.0

2025-08-18 01:01:48 -06:00

IMPLEMENTATION_STATUS.md

Initial commit: MCP Office Tools v0.1.0

2025-08-18 01:01:48 -06:00

LICENSE

Initial commit: MCP Office Tools v0.1.0

2025-08-18 01:01:48 -06:00

Makefile

Add MS Office-themed test dashboard with interactive reporting

2026-01-11 00:28:12 -07:00

pyproject.toml

Initial commit: MCP Office Tools v0.1.0

2025-08-18 01:01:48 -06:00

QUICKSTART_DASHBOARD.md

Add MS Office-themed test dashboard with interactive reporting

2026-01-11 00:28:12 -07:00

README.md

Add MCP resources documentation and fix section format suffix

2026-01-11 10:23:47 -07:00

run_dashboard_tests.py

Add MS Office-themed test dashboard with interactive reporting

2026-01-11 00:28:12 -07:00

test_mcp_tools.py

Add MS Office-themed test dashboard with interactive reporting

2026-01-11 00:28:12 -07:00

test_pagination.py

Implement cursor-based pagination system for large document processing

2025-09-26 19:06:05 -06:00

TESTING_STRATEGY.md

Fix FastMCP stdio server import

2025-09-26 15:49:00 -06:00

torture_test.py

Add decorators for field defaults and error handling, fix Excel performance

2026-01-10 23:51:30 -07:00

uv.lock

Add decorators for field defaults and error handling, fix Excel performance

2026-01-10 23:51:30 -07:00

view_dashboard.sh

Add MS Office-themed test dashboard with interactive reporting

2026-01-11 00:28:12 -07:00

README.md

📊 MCP Office Tools

MCP server for extracting text, tables, images, and data from Microsoft Office files

Word, Excel, PowerPoint, CSV — all the formats your AI agent needs to read but can't

Installation • Tools • Examples • Testing

✨ Features

Universal extraction — Pull text, images, and metadata from any Office format
Format-specific tools — Deep analysis for Word (tables, structure), Excel (formulas, charts), PowerPoint
Automatic pagination — Large documents get chunked so they don't blow up your context window
Fallback processing — When one library chokes on a weird file, we try another. No silent failures.
URL support — Pass a URL instead of a file path; we'll download and cache it
Legacy formats — Yes, even those .doc and .xls files from 2003 still work

🚀 Installation

# Quick install with uvx (recommended)
uvx mcp-office-tools

# Or install with uv/pip
uv add mcp-office-tools
pip install mcp-office-tools

Claude Desktop Configuration

Add to your claude_desktop_config.json:

{
  "mcpServers": {
    "office-tools": {
      "command": "uvx",
      "args": ["mcp-office-tools"]
    }
  }
}

Claude Code Configuration

claude mcp add office-tools "uvx mcp-office-tools"

🛠 Available Tools

Universal Tools

Work with all Office formats: Word, Excel, PowerPoint, CSV

Tool	Description
`extract_text`	Extract text with optional formatting preservation
`extract_images`	Extract embedded images with size filtering
`extract_metadata`	Get document properties (author, dates, statistics)
`detect_office_format`	Identify format, version, encryption status
`analyze_document_health`	Check integrity, corruption, password protection
`get_supported_formats`	List all supported file extensions
`index_document`	Scan document and create resource URIs for on-demand fetching

Word Tools

Tool	Description
`convert_to_markdown`	Convert to Markdown with automatic pagination for large docs
`extract_word_tables`	Extract tables as structured JSON, CSV, or Markdown
`analyze_word_structure`	Analyze headings, sections, styles, and document hierarchy
`get_document_outline`	Get structured outline with chapter detection and word counts
`check_style_consistency`	Find formatting issues, missing chapters, style problems
`search_document`	Search text with context and chapter location
`extract_entities`	Extract people, places, organizations using pattern recognition
`get_chapter_summaries`	Generate chapter previews with opening sentences
`save_reading_progress`	Bookmark your reading position for later
`get_reading_progress`	Resume reading from saved position

Excel Tools

Tool	Description
`analyze_excel_data`	Statistical analysis: data types, missing values, outliers
`extract_excel_formulas`	Extract formulas with values and dependency analysis
`create_excel_chart_data`	Generate Chart.js/Plotly-ready data from spreadsheets

📋 Format Support

Here's what works and what's "good enough" — legacy formats from Office 97-2003 have more limited extraction, but they still work:

Format	Extension	Text	Images	Metadata	Tables	Formulas
Word (Modern)	`.docx`	✅	✅	✅	✅	-
Word (Legacy)	`.doc`	✅	⚠️	⚠️	⚠️	-
Word Template	`.dotx`	✅	✅	✅	✅	-
Word Macro	`.docm`	✅	✅	✅	✅	-
Excel (Modern)	`.xlsx`	✅	✅	✅	✅	✅
Excel (Legacy)	`.xls`	✅	⚠️	⚠️	✅	⚠️
Excel Template	`.xltx`	✅	✅	✅	✅	✅
Excel Macro	`.xlsm`	✅	✅	✅	✅	✅
PowerPoint (Modern)	`.pptx`	✅	✅	✅	✅	-
PowerPoint (Legacy)	`.ppt`	✅	⚠️	⚠️	⚠️	-
PowerPoint Template	`.potx`	✅	✅	✅	✅	-
CSV	`.csv`	✅	-	⚠️	✅	-

✅ Full support • ⚠️ Basic/partial support • - Not applicable

🔗 MCP Resources

Instead of returning entire documents in tool responses, you can index a document once and fetch content on-demand via URI-based resources. This keeps context windows manageable when working with large files.

How It Works

Index the document — index_document scans the file and returns URIs
Fetch what you need — Request specific chapters, sheets, slides, or images by URI
Format on demand — Append .txt or .html to get different output formats

Resource URI Patterns

URI Pattern	Description	Example
`chapter://{doc_id}/{n}`	Single chapter/section	`chapter://abc123/3`
`chapters://{doc_id}/{range}`	Multiple chapters	`chapters://abc123/1-5`
`section://{doc_id}/{n}`	Section by heading style	`section://abc123/2`
`paragraph://{doc_id}/{ch}/{p}`	Specific paragraph	`paragraph://abc123/3/7`
`sheet://{doc_id}/{name}`	Excel sheet as markdown table	`sheet://abc123/Revenue`
`slide://{doc_id}/{n}`	PowerPoint slide	`slide://abc123/5`
`slides://{doc_id}/{range}`	Multiple slides	`slides://abc123/1,3,5`
`image://{doc_id}/{n}`	Embedded image	`image://abc123/0`

Format Suffixes

Append a format suffix to convert on the fly:

Suffix	Output
`.md` (default)	Markdown
`.txt`	Plain text (no formatting)
`.html`	Basic HTML

Examples:

chapter://abc123/3 → Markdown (default)
chapter://abc123/3.txt → Plain text
chapter://abc123/3.html → HTML

Range Syntax

Fetch multiple items at once:

1-5 → Items 1 through 5
1,3,5 → Specific items
1-3,7,9-10 → Mixed ranges

Section Detection

The indexer detects document structure automatically:

Heading 1 styles (primary) — Business docs, manuals, technical documents
"Chapter X" text patterns (fallback) — Books, manuscripts, narratives

Use text_patterns_only=True to skip heading style detection for documents with messy formatting.

🎯 MCP Prompts

Pre-built workflows that chain multiple tools together. Use these as starting points:

Prompt	Level	Description
`explore-document`	Basic	Start with any new document - get structure and identify issues
`find-character`	Basic	Track all mentions of a person/character with context
`chapter-preview`	Basic	Quick overview of each chapter without full read
`resume-reading`	Intermediate	Check saved position and continue reading
`document-analysis`	Intermediate	Comprehensive multi-tool analysis
`character-journey`	Advanced	Track character arc through entire narrative
`document-comparison`	Advanced	Compare entities and themes between chapters
`full-reading-session`	Advanced	Guided reading with bookmarking
`manuscript-review`	Advanced	Complete editorial workflow for editors

💡 Usage Examples

Extract Text from Any Document

# Simple extraction
result = await extract_text("report.docx")
print(result["text"])

# With formatting preserved
result = await extract_text(
    file_path="report.docx",
    preserve_formatting=True,
    include_metadata=True
)

Convert Word to Markdown (with Pagination)

Large documents get paginated automatically. Three ways to handle it:

# Option 1: Follow the cursor for each chunk
result = await convert_to_markdown("big-manual.docx")
if result.get("pagination", {}).get("has_more"):
    next_page = await convert_to_markdown(
        "big-manual.docx",
        cursor_id=result["pagination"]["cursor_id"]
    )

# Option 2: Grab specific pages
result = await convert_to_markdown("big-manual.docx", page_range="1-10")

# Option 3: Extract by chapter heading
result = await convert_to_markdown("big-manual.docx", chapter_name="Introduction")

Analyze Excel Data Quality

result = await analyze_excel_data(
    file_path="sales-data.xlsx",
    include_statistics=True,
    check_data_quality=True
)

# Returns per-column analysis
# {
#   "analysis": {
#     "Sheet1": {
#       "dimensions": {"rows": 1000, "columns": 12},
#       "column_info": {
#         "Revenue": {
#           "data_type": "float64",
#           "null_percentage": 2.3,
#           "statistics": {"mean": 45000, "median": 42000, ...},
#           "quality_issues": ["5 potential outliers"]
#         }
#       },
#       "data_quality": {
#         "completeness_percentage": 97.8,
#         "duplicate_rows": 12
#       }
#     }
#   }
# }

Extract Excel Formulas

result = await extract_excel_formulas(
    file_path="financial-model.xlsx",
    analyze_dependencies=True
)

# Returns formula details with dependency mapping
# {
#   "formulas": {
#     "Sheet1": [
#       {
#         "cell": "D2",
#         "formula": "=B2*C2",
#         "value": 1500.00,
#         "dependencies": ["B2", "C2"]
#       }
#     ]
#   }
# }

Generate Chart Data

result = await create_excel_chart_data(
    file_path="quarterly-revenue.xlsx",
    chart_type="line",
    output_format="chartjs"
)

# Returns ready-to-use Chart.js configuration
# {
#   "chartjs": {
#     "type": "line",
#     "data": {
#       "labels": ["Q1", "Q2", "Q3", "Q4"],
#       "datasets": [{"label": "Revenue", "data": [100, 120, 115, 140]}]
#     }
#   }
# }

Extract Word Tables

result = await extract_word_tables(
    file_path="contract.docx",
    output_format="markdown"
)

# Returns tables with optional format conversion
# {
#   "tables": [
#     {
#       "table_index": 0,
#       "dimensions": {"rows": 5, "columns": 3},
#       "converted_output": "| Name | Role | Department |\n|---|---|---|\n..."
#     }
#   ]
# }

Process Documents from URLs

# Documents are downloaded and cached automatically
result = await extract_text("https://example.com/report.docx")

# Cache expires after 1 hour by default

Index Document for On-Demand Resource Fetching

# Index the document - returns URIs for all content
result = await index_document("novel.docx")

# Returns:
# {
#   "doc_id": "56036b0f171a",
#   "resources": {
#     "chapter": [
#       {"id": "1", "title": "Chapter 1: The Beginning", "uri": "chapter://56036b0f171a/1"},
#       {"id": "2", "title": "Chapter 2: Rising Action", "uri": "chapter://56036b0f171a/2"},
#       ...
#     ],
#     "image": [
#       {"id": "0", "uri": "image://56036b0f171a/0"},
#       ...
#     ]
#   }
# }

# Now fetch specific content via MCP resources:
# - chapter://56036b0f171a/1      → Chapter 1 as markdown
# - chapter://56036b0f171a/1.txt  → Chapter 1 as plain text
# - chapters://56036b0f171a/1-3   → Chapters 1-3 combined
# - image://56036b0f171a/0        → First embedded image

# Works with Excel and PowerPoint too:
await index_document("data.xlsx")
# → sheet://abc123/Revenue, sheet://abc123/Expenses, ...

await index_document("presentation.pptx")
# → slide://def456/1, slide://def456/2, ...

🧪 Testing

We built a visual test dashboard because staring at pytest output gets old. Run make test and you get an HTML report with pass/fail stats, detailed I/O for each test, and expandable tracebacks when things break.

# Run tests and generate the dashboard
make test

# Just pytest, no dashboard
make test-pytest

# Open existing dashboard
make view-dashboard

The dashboard has an MS Office-inspired theme (Word blue, Excel green, PowerPoint orange) and groups tests by category so you can see what's working at a glance.

🏗 Architecture

The mixin pattern keeps things modular — universal tools work on everything, format-specific tools go deeper. When the primary library can't handle something (corrupted files, weird formatting), we fall back to alternatives.

mcp-office-tools/
├── src/mcp_office_tools/
│   ├── server.py              # FastMCP server + resource templates
│   ├── resources.py           # Resource store for on-demand content
│   ├── mixins/
│   │   ├── universal.py       # Format-agnostic tools (incl. index_document)
│   │   ├── word.py            # Word-specific tools
│   │   ├── excel.py           # Excel-specific tools
│   │   └── powerpoint.py      # PowerPoint tools (WIP)
│   ├── utils/
│   │   ├── validation.py      # File validation
│   │   ├── file_detection.py  # Format detection
│   │   ├── caching.py         # URL caching
│   │   └── decorators.py      # Error handling, defaults
│   └── pagination.py          # Large document pagination
├── tests/                     # pytest test suite
└── reports/                   # Test dashboard output

Processing Libraries

Format	Primary Library	Fallback
`.docx`	python-docx	mammoth
`.xlsx`	openpyxl	pandas
`.pptx`	python-pptx	-
`.doc`/`.xls`/`.ppt`	olefile	-
`.csv`	pandas	built-in csv

🔧 Development

# Clone and install
git clone https://github.com/yourusername/mcp-office-tools.git
cd mcp-office-tools
uv sync --dev

# Run tests
uv run pytest

# Format and lint
uv run black src/ tests/
uv run ruff check src/ tests/

# Type check
uv run mypy src/

📦 Dependencies

Core:

fastmcp - MCP server framework
python-docx - Word document processing
openpyxl - Excel spreadsheet processing
python-pptx - PowerPoint processing
pandas - Data analysis and CSV handling
mammoth - Word to HTML/Markdown conversion
olefile - Legacy OLE format support
xlrd - Legacy Excel support
pillow - Image processing
aiohttp / aiofiles - Async HTTP and file I/O

Optional:

python-magic - Enhanced MIME type detection
msoffcrypto-tool - Encrypted file detection

MCP PDF Tools - Companion server for PDF processing
FastMCP - The framework powering this server

📝 Behind the Scenes

This README was rewritten during a human-AI collaboration session. The process raised questions about discernment, voice, and what makes documentation actually land:

AI Isn't New. Your Discernment Is What Matters. — Ryan's take on 40 years of writing code and why discernment matters more than the tools

📜 License

MIT License - see LICENSE for details.

Built with FastMCP and the Model Context Protocol

Languages

Python 89.5%

HTML 8.3%

Makefile 1.5%

Dockerfile 0.5%

Shell 0.2%

README.md

📊 MCP Office Tools

✨ Features

🚀 Installation

Claude Desktop Configuration

Claude Code Configuration

🛠 Available Tools

Universal Tools

Word Tools

Excel Tools

📋 Format Support

🔗 MCP Resources

How It Works

Resource URI Patterns

Format Suffixes

Range Syntax

Section Detection

🎯 MCP Prompts

💡 Usage Examples

Extract Text from Any Document

Convert Word to Markdown (with Pagination)

Analyze Excel Data Quality

Extract Excel Formulas

Generate Chart Data

Extract Word Tables

Process Documents from URLs

Index Document for On-Demand Resource Fetching

🧪 Testing

🏗 Architecture

Processing Libraries

🔧 Development

📦 Dependencies

🤝 Related Projects

📝 Behind the Scenes

📜 License