mcp-office-tools

Author	SHA1	Message	Date
Ryan Malloy	4b38f6455c	Add document navigation tools and MCP prompts New tools for Word document analysis: - extract_entities: Pattern-based extraction of people, places, organizations - get_chapter_summaries: Chapter previews with opening sentences and word counts - save_reading_progress: Bookmark reading position to JSON file - get_reading_progress: Resume reading from saved position New MCP prompts (basic to advanced workflows): - explore-document: Get started with a new document - find-character: Track character mentions - chapter-preview: Quick chapter overviews - resume-reading: Continue where you left off - document-analysis: Comprehensive multi-tool analysis - character-journey: Track character arc through narrative - document-comparison: Compare entities between chapters - full-reading-session: Guided reading with bookmarking - manuscript-review: Complete editorial workflow Updated test counts for 19 total tools (6 universal + 10 word + 3 excel)	2026-01-11 07:23:15 -07:00
Ryan Malloy	1abce7f26d	Add document navigation tools: outline, style check, search New tools for easier document navigation: - get_document_outline: Structured view of headings with chapter detection - check_style_consistency: Find formatting issues and missing chapters - search_document: Search with context and chapter location All tools tested with 200+ page manuscript. Detects issues like Chapter 3 being styled as "normal" instead of "Heading 1".	2026-01-11 07:15:43 -07:00
Ryan Malloy	af6aadf559	Refactor: Extract processing logic into utility modules Complete architecture cleanup - eliminated duplicate server files: - Deleted server_monolithic.py (2249 lines) - Deleted server_legacy.py (2209 lines) New utility modules created: - utils/word_processing.py - Word extraction/conversion (preserves page range fixes) - utils/excel_processing.py - Excel extraction - utils/powerpoint_processing.py - PowerPoint extraction - utils/processing.py - Universal helpers (parse_page_range, health checks, etc.) Updated mixins to import from utils instead of server_monolithic. Entry point remains server.py (48 lines) using mixin architecture. All 53 tests pass. Coverage improved from 11% to 22% by removing duplicate code.	2026-01-11 05:08:18 -07:00
Ryan Malloy	210aa99e0b	Fix page range extraction for large documents and MCP connection Bug fixes: - Remove 100-paragraph cap that prevented extracting content past ~page 4 Now calculates limit based on number of pages requested (300 paras/page) - Add fallback page estimation when docs lack explicit page breaks Uses ~25 paragraphs per page for navigation in non-paginated docs - Fix _get_available_headings to scan full document (was only first 100 elements) Headings like Chapter 10 at element 1524 were invisible - Fix MCP connection by disabling FastMCP banner (show_banner=False) ASCII art banner was corrupting stdout JSON-RPC protocol Changes: - Default image_mode changed from 'base64' to 'files' to avoid huge responses - Add proper .mcp.json config with command/args format - Add test document to .gitignore for privacy	2026-01-11 04:27:56 -07:00
Ryan Malloy	76c7a0b2d0	Add decorators for field defaults and error handling, fix Excel performance - Create @resolve_field_defaults decorator to handle Pydantic FieldInfo objects when tools are called directly (outside MCP framework) - Create @handle_office_errors decorator for consistent error wrapping - Apply decorators to Excel and Word mixins, removing ~100 lines of boilerplate code - Fix Excel formula extraction performance: load workbooks once before loop instead of per-cell (100x faster with calculated values) - Update test suite to use correct mock patch paths (patch where names are looked up, not where defined) - Add torture_test.py for real document validation	2026-01-10 23:51:30 -07:00
Ryan Malloy	1ad2abb617	Implement cursor-based pagination system for large document processing - Add comprehensive pagination infrastructure based on MCP Playwright patterns - Integrate automatic pagination into convert_to_markdown tool for documents >25k tokens - Support cursor-based navigation with session isolation and security - Prevent MCP token limit errors for massive documents (200+ pages) - Maintain document structure and context across paginated sections - Add configurable page sizes, return_all bypass, and intelligent token estimation - Enable seamless navigation through extremely dense documents that exceed limits by 100x	2025-09-26 19:06:05 -06:00
Ryan Malloy	0748eec48d	Fix FastMCP stdio server import - Use app.run_stdio_async() instead of deprecated stdio_server import - Aligns with FastMCP 2.11.3 API - Server now starts correctly with uv run mcp-office-tools - Maintains all MCPMixin functionality and tool registration	2025-09-26 15:49:00 -06:00
Ryan Malloy	9d6a9fc24c	Refactor server architecture using mcpmixin pattern - Split monolithic 2209-line server.py into organized mixin classes - UniversalMixin: Format-agnostic tools (extract_text, extract_images, etc.) - WordMixin: Word-specific tools (convert_to_markdown with chapter_name support) - ExcelMixin: Placeholder for future Excel-specific tools - PowerPointMixin: Placeholder for future PowerPoint-specific tools Benefits: • Improved maintainability and separation of concerns • Better testability with isolated mixins • Easier team collaboration on different file types • Reduced cognitive load per module • Preserved all 7 existing tools with full functionality Architecture now supports clean expansion for format-specific tools while maintaining backward compatibility through legacy server backup.	2025-09-26 13:08:53 -06:00

8 Commits