- Add Dockerfile with multi-stage build using uv
- Add docker-compose.yml with caddy-docker-proxy labels for /mcp endpoint
- Add .env.example for deployment configuration
- Update Makefile with docker-* targets
- Update server.py to support MCP_TRANSPORT env var:
- 'stdio' (default): Local CLI usage with Claude Code
- 'streamable-http': Hosted HTTP mode behind reverse proxy
Hosted server will be available at:
https://mcwaddams.supported.systems/mcp
Named for Milton Waddams, who was relocated to the basement with
boxes of legacy documents. He handles the .doc and .xls files from
1997 that nobody else wants to touch.
- Rename package from mcp-office-tools to mcwaddams
- Update author to Ryan Malloy
- Update all imports and references
- Add Office Space themed README narrative
- All 53 tests passing
- Document MCP resource system in README with URI patterns, format
suffixes, range syntax, and section detection strategies
- Add index_document to Universal Tools table
- Update architecture section to include resources.py
- Fix section:// resource to support .md/.txt/.html format suffixes
(matching chapter:// behavior)
- Primary: Detect sections via Heading 1 styles (structured)
- Fallback: Detect chapters via "Chapter X" text patterns
- Add text_patterns_only flag to skip heading styles (for messy docs)
This handles both well-structured business documents (manuals, PRDs)
and narrative content (books with explicit chapter headings).
- Add 7 new Word tools to README (outline, search, entities, etc.)
- Add 9 MCP prompts section with workflow descriptions
- Gitignore reading progress bookmark files (.*.reading_progress.json)
- Gitignore local .mcp.json and test documents
New tools for Word document analysis:
- extract_entities: Pattern-based extraction of people, places, organizations
- get_chapter_summaries: Chapter previews with opening sentences and word counts
- save_reading_progress: Bookmark reading position to JSON file
- get_reading_progress: Resume reading from saved position
New MCP prompts (basic to advanced workflows):
- explore-document: Get started with a new document
- find-character: Track character mentions
- chapter-preview: Quick chapter overviews
- resume-reading: Continue where you left off
- document-analysis: Comprehensive multi-tool analysis
- character-journey: Track character arc through narrative
- document-comparison: Compare entities between chapters
- full-reading-session: Guided reading with bookmarking
- manuscript-review: Complete editorial workflow
Updated test counts for 19 total tools (6 universal + 10 word + 3 excel)
New tools for easier document navigation:
- get_document_outline: Structured view of headings with chapter detection
- check_style_consistency: Find formatting issues and missing chapters
- search_document: Search with context and chapter location
All tools tested with 200+ page manuscript. Detects issues like
Chapter 3 being styled as "normal" instead of "Heading 1".
Documents 6 critical bugs discovered while processing a 200+ page
manuscript, including the root cause xpath API mismatch between
python-docx and lxml that caused silent failures in chapter search.
python-docx elements don't support xpath() with namespaces kwarg.
The calls silently failed in try/except blocks, causing chapter search
and heading detection to never find matches.
Fixed by replacing xpath(..., namespaces={...}) with:
- findall('.//' + qn('w:t')) for text elements
- find(qn('w:pPr')) + find(qn('w:pStyle')) for style detection
- get(qn('w:val')) for attribute values
Also fixed logic bug where elif prevented short-text fallback from
running when a non-heading style existed on the paragraph.
The pyproject.toml script entry point (mcp-office-tools) uses server.py,
not server_monolithic.py. Applied same show_banner=False fix and
simplified to use app.run() instead of asyncio.run(app.run_stdio_async()).
Bug fixes:
- Remove 100-paragraph cap that prevented extracting content past ~page 4
Now calculates limit based on number of pages requested (300 paras/page)
- Add fallback page estimation when docs lack explicit page breaks
Uses ~25 paragraphs per page for navigation in non-paginated docs
- Fix _get_available_headings to scan full document (was only first 100 elements)
Headings like Chapter 10 at element 1524 were invisible
- Fix MCP connection by disabling FastMCP banner (show_banner=False)
ASCII art banner was corrupting stdout JSON-RPC protocol
Changes:
- Default image_mode changed from 'base64' to 'files' to avoid huge responses
- Add proper .mcp.json config with command/args format
- Add test document to .gitignore for privacy
Links README to Ryan's AI discernment article, which discusses
the documentation rewrite process and connects to the model's
perspective in the collaborations archive.
- Replace generic opener with direct description
- Make feature bullets more conversational (less "feature list" mode)
- Add context before format support table
- Clarify pagination example with "three ways" structure
- Lead testing section with the dashboard hook
- Add architecture design rationale
- Remove "comprehensive" and "intelligent" buzzwords
- Document all 12 actual MCP tools (6 universal, 3 Word, 3 Excel)
- Add comprehensive format support matrix with feature breakdown
- Include practical usage examples with real output structures
- Add test dashboard section
- Simplify installation with uvx/Claude Code instructions
- Remove marketing fluff; focus on technical accuracy
- Self-contained HTML dashboard with MS Office 365 design
- pytest plugin captures inputs, outputs, and errors per test
- Unified orchestrator runs pytest + torture tests together
- Test files persisted in reports/test_files/ with relative links
- GitHub Actions workflow with PR comments and job summaries
- Makefile with convenient commands (test, view-dashboard, etc.)
- Works offline with embedded JSON data (no CORS issues)
- Create @resolve_field_defaults decorator to handle Pydantic FieldInfo
objects when tools are called directly (outside MCP framework)
- Create @handle_office_errors decorator for consistent error wrapping
- Apply decorators to Excel and Word mixins, removing ~100 lines of
boilerplate code
- Fix Excel formula extraction performance: load workbooks once before
loop instead of per-cell (100x faster with calculated values)
- Update test suite to use correct mock patch paths (patch where names
are looked up, not where defined)
- Add torture_test.py for real document validation
- Use app.run_stdio_async() instead of deprecated stdio_server import
- Aligns with FastMCP 2.11.3 API
- Server now starts correctly with uv run mcp-office-tools
- Maintains all MCPMixin functionality and tool registration
- Add main() function back to server.py for CLI script entry point
- Maintains FastMCP MCPMixin pattern while fixing uvx execution
- Server now starts properly with 'uvx --from . mcp-office-tools'
- Preserves all 7 tools with official mixin registration
- Split monolithic 2209-line server.py into organized mixin classes
- UniversalMixin: Format-agnostic tools (extract_text, extract_images, etc.)
- WordMixin: Word-specific tools (convert_to_markdown with chapter_name support)
- ExcelMixin: Placeholder for future Excel-specific tools
- PowerPointMixin: Placeholder for future PowerPoint-specific tools
Benefits:
• Improved maintainability and separation of concerns
• Better testability with isolated mixins
• Easier team collaboration on different file types
• Reduced cognitive load per module
• Preserved all 7 existing tools with full functionality
Architecture now supports clean expansion for format-specific tools
while maintaining backward compatibility through legacy server backup.
- Add chapter_name parameter to convert_to_markdown tool
- Implement _find_chapter_content_range() for heading-based navigation
- Add _get_available_headings() to help users find chapter names
- Include chapter extraction metadata in results
- Enhanced ultra-fast summary with available headings
- Provides alternative to bookmark extraction when bookmarks unavailable
- Add bookmark_name parameter for extracting specific chapters/sections
- Implement bookmark boundary detection using Word XML structure
- Extract content between bookmark start/end markers with smart extension
- More reliable than page ranges - bookmarks are anchored to exact locations
- Support chapter extraction like bookmark_name='Chapter1_Start'
- Include bookmark metadata in response with element ranges
- Perfect for extracting individual chapters from large documents
- Replace unreliable Word page detection with element-based limiting
- Cap extraction at 25 paragraphs per 'page' requested (max 100 total)
- Cap extraction at 8k chars per 'page' requested (max 40k total)
- Add early termination when limits reached
- Add processing_limits metadata to show actual extraction stats
- Prevent 1.28M token responses by stopping at reasonable content limits
- Single page (page_range='1') now limited to ~25 paragraphs/8k chars
- Bypass all complex processing in summary_only mode
- Extract only first 50 paragraphs, max 10 headings, 5 content paragraphs
- Add bookmark detection for chapter navigation hints
- Limit summary content to 2000 chars max
- Prevent 1,282,370 token responses with surgical precision
- Show bookmark names as chapter start indicators
- Extract headings with page numbers during document processing
- Generate optimized page ranges for each section/chapter
- Provide intelligent chunking suggestions (15-page optimal chunks)
- Classify section types (chapter, section, subsection, etc.)
- Calculate actual section lengths based on heading positions
- Include suggested_chunking with ready-to-use page ranges
- Perfect for extracting 200+ page documents section by section
- Analyze document size and complexity before processing
- Provide clear workflow recommendations in response metadata
- Strongly recommend summary_only + page_range for large documents (>10 pages)
- Add warning system for suboptimal usage patterns
- Update parameter descriptions with best practice guidance
- Help users avoid 25k token response limits proactively
- Add page break detection using Word XML structure
- Process only specified pages instead of full document + truncation
- Route page-range requests to python-docx for granular control
- Skip mammoth for page-specific processing (mammoth processes full doc)
- Add page metadata to results when filtering is used
- Significantly reduce memory usage and response size for large documents
- Add eye-catching visual design with emojis and badges
- Create compelling hero section with value proposition
- Include real-world benchmarks and performance metrics
- Add enterprise success stories and use cases
- Implement collapsible sections for better organization
- Include Mermaid architecture diagram
- Add comprehensive feature matrix with visual indicators
- Create roadmap and community sections
- Enhance installation and setup instructions
- Make it GitHub-ready with proper formatting
🚀 Now ready to wow potential users and contributors!
- Comprehensive Microsoft Office document processing server
- Support for Word (.docx, .doc), Excel (.xlsx, .xls), PowerPoint (.pptx, .ppt), CSV
- 6 universal tools: extract_text, extract_images, extract_metadata, detect_office_format, analyze_document_health, get_supported_formats
- Multi-library fallback system for robust processing
- URL support with intelligent caching
- Legacy Office format support (97-2003)
- FastMCP integration with async architecture
- Production ready with comprehensive documentation
🤖 Generated with Claude Code (claude.ai/code)
Co-Authored-By: Claude <noreply@anthropic.com>