Bug fixes:
- Remove 100-paragraph cap that prevented extracting content past ~page 4
Now calculates limit based on number of pages requested (300 paras/page)
- Add fallback page estimation when docs lack explicit page breaks
Uses ~25 paragraphs per page for navigation in non-paginated docs
- Fix _get_available_headings to scan full document (was only first 100 elements)
Headings like Chapter 10 at element 1524 were invisible
- Fix MCP connection by disabling FastMCP banner (show_banner=False)
ASCII art banner was corrupting stdout JSON-RPC protocol
Changes:
- Default image_mode changed from 'base64' to 'files' to avoid huge responses
- Add proper .mcp.json config with command/args format
- Add test document to .gitignore for privacy
- Self-contained HTML dashboard with MS Office 365 design
- pytest plugin captures inputs, outputs, and errors per test
- Unified orchestrator runs pytest + torture tests together
- Test files persisted in reports/test_files/ with relative links
- GitHub Actions workflow with PR comments and job summaries
- Makefile with convenient commands (test, view-dashboard, etc.)
- Works offline with embedded JSON data (no CORS issues)
- Create @resolve_field_defaults decorator to handle Pydantic FieldInfo
objects when tools are called directly (outside MCP framework)
- Create @handle_office_errors decorator for consistent error wrapping
- Apply decorators to Excel and Word mixins, removing ~100 lines of
boilerplate code
- Fix Excel formula extraction performance: load workbooks once before
loop instead of per-cell (100x faster with calculated values)
- Update test suite to use correct mock patch paths (patch where names
are looked up, not where defined)
- Add torture_test.py for real document validation
- Use app.run_stdio_async() instead of deprecated stdio_server import
- Aligns with FastMCP 2.11.3 API
- Server now starts correctly with uv run mcp-office-tools
- Maintains all MCPMixin functionality and tool registration
- Comprehensive Microsoft Office document processing server
- Support for Word (.docx, .doc), Excel (.xlsx, .xls), PowerPoint (.pptx, .ppt), CSV
- 6 universal tools: extract_text, extract_images, extract_metadata, detect_office_format, analyze_document_health, get_supported_formats
- Multi-library fallback system for robust processing
- URL support with intelligent caching
- Legacy Office format support (97-2003)
- FastMCP integration with async architecture
- Production ready with comprehensive documentation
🤖 Generated with Claude Code (claude.ai/code)
Co-Authored-By: Claude <noreply@anthropic.com>