mcp-office-tools

Author	SHA1	Message	Date
Ryan Malloy	322ed78427	Add Docker deployment with streamable-http transport for hosted MCP Some checks failed Test Dashboard / test-and-dashboard (push) Has been cancelled Details - Add Dockerfile with multi-stage build using uv - Add docker-compose.yml with caddy-docker-proxy labels for /mcp endpoint - Add .env.example for deployment configuration - Update Makefile with docker-* targets - Update server.py to support MCP_TRANSPORT env var: - 'stdio' (default): Local CLI usage with Claude Code - 'streamable-http': Hosted HTTP mode behind reverse proxy Hosted server will be available at: https://mcwaddams.supported.systems/mcp	2026-01-11 14:27:50 -07:00
Ryan Malloy	3d469e5696	Add author section with ryanmalloy.com link Some checks are pending Test Dashboard / test-and-dashboard (push) Waiting to run Details	2026-01-11 11:54:17 -07:00
Ryan Malloy	31948d6ffc	Rename package to mcwaddams Some checks are pending Test Dashboard / test-and-dashboard (push) Waiting to run Details Named for Milton Waddams, who was relocated to the basement with boxes of legacy documents. He handles the .doc and .xls files from 1997 that nobody else wants to touch. - Rename package from mcp-office-tools to mcwaddams - Update author to Ryan Malloy - Update all imports and references - Add Office Space themed README narrative - All 53 tests passing	2026-01-11 11:35:35 -07:00
Ryan Malloy	6fb76d8760	Add MCP resources documentation and fix section format suffix Some checks are pending Test Dashboard / test-and-dashboard (push) Waiting to run Details - Document MCP resource system in README with URI patterns, format suffixes, range syntax, and section detection strategies - Add index_document to Universal Tools table - Update architecture section to include resources.py - Fix section:// resource to support .md/.txt/.html format suffixes (matching chapter:// behavior)	2026-01-11 10:23:47 -07:00
Ryan Malloy	89ad0c849d	Improve section detection with heading styles + fallback Some checks are pending Test Dashboard / test-and-dashboard (push) Waiting to run Details - Primary: Detect sections via Heading 1 styles (structured) - Fallback: Detect chapters via "Chapter X" text patterns - Add text_patterns_only flag to skip heading styles (for messy docs) This handles both well-structured business documents (manuals, PRDs) and narrative content (books with explicit chapter headings).	2026-01-11 09:40:38 -07:00
Ryan Malloy	d569034fa3	Add MCP resource system for embedded document content Some checks are pending Test Dashboard / test-and-dashboard (push) Waiting to run Details Implements URI-based access to document content with: - ResourceStore for caching extracted images, chapters, sheets, slides - Content-based document IDs (SHA256 hash) for stable URIs across sessions - 11 resource templates with flexible URI patterns: - Binary: image://, chart://, media://, embed:// - Text: chapter://, section://, sheet://, slide:// - Ranges: chapters://doc/1-5, slides://doc/1,3,5 - Hierarchical: paragraph://doc/3/5 - Format suffixes for output control: - chapter://doc/3.md (default markdown) - chapter://doc/3.txt (plain text) - chapter://doc/3.html (basic HTML) - index_document tool scans and populates resources: - Word: chapters as markdown, embedded images - Excel: sheets as markdown tables - PowerPoint: slides as markdown Tool responses return URIs instead of blobs - clients fetch only what they need.	2026-01-11 09:04:29 -07:00
Ryan Malloy	11defb4eae	Update README and gitignore for new document tools Some checks are pending Test Dashboard / test-and-dashboard (push) Waiting to run Details - Add 7 new Word tools to README (outline, search, entities, etc.) - Add 9 MCP prompts section with workflow descriptions - Gitignore reading progress bookmark files (.*.reading_progress.json) - Gitignore local .mcp.json and test documents	2026-01-11 07:41:49 -07:00
Ryan Malloy	4b38f6455c	Add document navigation tools and MCP prompts New tools for Word document analysis: - extract_entities: Pattern-based extraction of people, places, organizations - get_chapter_summaries: Chapter previews with opening sentences and word counts - save_reading_progress: Bookmark reading position to JSON file - get_reading_progress: Resume reading from saved position New MCP prompts (basic to advanced workflows): - explore-document: Get started with a new document - find-character: Track character mentions - chapter-preview: Quick chapter overviews - resume-reading: Continue where you left off - document-analysis: Comprehensive multi-tool analysis - character-journey: Track character arc through narrative - document-comparison: Compare entities between chapters - full-reading-session: Guided reading with bookmarking - manuscript-review: Complete editorial workflow Updated test counts for 19 total tools (6 universal + 10 word + 3 excel)	2026-01-11 07:23:15 -07:00
Ryan Malloy	1abce7f26d	Add document navigation tools: outline, style check, search New tools for easier document navigation: - get_document_outline: Structured view of headings with chapter detection - check_style_consistency: Find formatting issues and missing chapters - search_document: Search with context and chapter location All tools tested with 200+ page manuscript. Detects issues like Chapter 3 being styled as "normal" instead of "Heading 1".	2026-01-11 07:15:43 -07:00
Ryan Malloy	34e636e782	Add documentation for DOCX processing fixes Documents 6 critical bugs discovered while processing a 200+ page manuscript, including the root cause xpath API mismatch between python-docx and lxml that caused silent failures in chapter search.	2026-01-11 06:47:39 -07:00
Ryan Malloy	2f39c4ec5b	Fix critical xpath API bug breaking chapter/heading detection python-docx elements don't support xpath() with namespaces kwarg. The calls silently failed in try/except blocks, causing chapter search and heading detection to never find matches. Fixed by replacing xpath(..., namespaces={...}) with: - findall('.//' + qn('w:t')) for text elements - find(qn('w:pPr')) + find(qn('w:pStyle')) for style detection - get(qn('w:val')) for attribute values Also fixed logic bug where elif prevented short-text fallback from running when a non-heading style existed on the paragraph.	2026-01-11 05:20:05 -07:00
Ryan Malloy	af6aadf559	Refactor: Extract processing logic into utility modules Complete architecture cleanup - eliminated duplicate server files: - Deleted server_monolithic.py (2249 lines) - Deleted server_legacy.py (2209 lines) New utility modules created: - utils/word_processing.py - Word extraction/conversion (preserves page range fixes) - utils/excel_processing.py - Excel extraction - utils/powerpoint_processing.py - PowerPoint extraction - utils/processing.py - Universal helpers (parse_page_range, health checks, etc.) Updated mixins to import from utils instead of server_monolithic. Entry point remains server.py (48 lines) using mixin architecture. All 53 tests pass. Coverage improved from 11% to 22% by removing duplicate code.	2026-01-11 05:08:18 -07:00
Ryan Malloy	8249afb763	Fix banner issue in server.py entry point The pyproject.toml script entry point (mcp-office-tools) uses server.py, not server_monolithic.py. Applied same show_banner=False fix and simplified to use app.run() instead of asyncio.run(app.run_stdio_async()).	2026-01-11 04:32:46 -07:00
Ryan Malloy	210aa99e0b	Fix page range extraction for large documents and MCP connection Bug fixes: - Remove 100-paragraph cap that prevented extracting content past ~page 4 Now calculates limit based on number of pages requested (300 paras/page) - Add fallback page estimation when docs lack explicit page breaks Uses ~25 paragraphs per page for navigation in non-paginated docs - Fix _get_available_headings to scan full document (was only first 100 elements) Headings like Chapter 10 at element 1524 were invisible - Fix MCP connection by disabling FastMCP banner (show_banner=False) ASCII art banner was corrupting stdout JSON-RPC protocol Changes: - Default image_mode changed from 'base64' to 'files' to avoid huge responses - Add proper .mcp.json config with command/args format - Add test document to .gitignore for privacy	2026-01-11 04:27:56 -07:00
Ryan Malloy	35869b6099	Add behind-the-scenes link to discernment blog post Some checks are pending Test Dashboard / test-and-dashboard (push) Waiting to run Details Links README to Ryan's AI discernment article, which discusses the documentation rewrite process and connects to the model's perspective in the collaborations archive.	2026-01-11 02:02:34 -07:00
Ryan Malloy	f159efab2c	Improve README tone and clarity Some checks are pending Test Dashboard / test-and-dashboard (push) Waiting to run Details - Replace generic opener with direct description - Make feature bullets more conversational (less "feature list" mode) - Add context before format support table - Clarify pagination example with "three ways" structure - Lead testing section with the dashboard hook - Add architecture design rationale - Remove "comprehensive" and "intelligent" buzzwords	2026-01-11 00:49:34 -07:00
Ryan Malloy	036160d029	Update README with accurate tool documentation Some checks are pending Test Dashboard / test-and-dashboard (push) Waiting to run Details - Document all 12 actual MCP tools (6 universal, 3 Word, 3 Excel) - Add comprehensive format support matrix with feature breakdown - Include practical usage examples with real output structures - Add test dashboard section - Simplify installation with uvx/Claude Code instructions - Remove marketing fluff; focus on technical accuracy	2026-01-11 00:45:00 -07:00
Ryan Malloy	c935cec7b6	Add MS Office-themed test dashboard with interactive reporting - Self-contained HTML dashboard with MS Office 365 design - pytest plugin captures inputs, outputs, and errors per test - Unified orchestrator runs pytest + torture tests together - Test files persisted in reports/test_files/ with relative links - GitHub Actions workflow with PR comments and job summaries - Makefile with convenient commands (test, view-dashboard, etc.) - Works offline with embedded JSON data (no CORS issues)	2026-01-11 00:28:12 -07:00
Ryan Malloy	76c7a0b2d0	Add decorators for field defaults and error handling, fix Excel performance - Create @resolve_field_defaults decorator to handle Pydantic FieldInfo objects when tools are called directly (outside MCP framework) - Create @handle_office_errors decorator for consistent error wrapping - Apply decorators to Excel and Word mixins, removing ~100 lines of boilerplate code - Fix Excel formula extraction performance: load workbooks once before loop instead of per-cell (100x faster with calculated values) - Update test suite to use correct mock patch paths (patch where names are looked up, not where defined) - Add torture_test.py for real document validation	2026-01-10 23:51:30 -07:00
Ryan Malloy	1ad2abb617	Implement cursor-based pagination system for large document processing - Add comprehensive pagination infrastructure based on MCP Playwright patterns - Integrate automatic pagination into convert_to_markdown tool for documents >25k tokens - Support cursor-based navigation with session isolation and security - Prevent MCP token limit errors for massive documents (200+ pages) - Maintain document structure and context across paginated sections - Add configurable page sizes, return_all bypass, and intelligent token estimation - Enable seamless navigation through extremely dense documents that exceed limits by 100x	2025-09-26 19:06:05 -06:00
Ryan Malloy	0748eec48d	Fix FastMCP stdio server import - Use app.run_stdio_async() instead of deprecated stdio_server import - Aligns with FastMCP 2.11.3 API - Server now starts correctly with uv run mcp-office-tools - Maintains all MCPMixin functionality and tool registration	2025-09-26 15:49:00 -06:00
Ryan Malloy	22f657b32b	Fix server entry point for pyproject.toml script - Add main() function back to server.py for CLI script entry point - Maintains FastMCP MCPMixin pattern while fixing uvx execution - Server now starts properly with 'uvx --from . mcp-office-tools' - Preserves all 7 tools with official mixin registration	2025-09-26 14:15:25 -06:00
Ryan Malloy	9d6a9fc24c	Refactor server architecture using mcpmixin pattern - Split monolithic 2209-line server.py into organized mixin classes - UniversalMixin: Format-agnostic tools (extract_text, extract_images, etc.) - WordMixin: Word-specific tools (convert_to_markdown with chapter_name support) - ExcelMixin: Placeholder for future Excel-specific tools - PowerPointMixin: Placeholder for future PowerPoint-specific tools Benefits: • Improved maintainability and separation of concerns • Better testability with isolated mixins • Easier team collaboration on different file types • Reduced cognitive load per module • Preserved all 7 existing tools with full functionality Architecture now supports clean expansion for format-specific tools while maintaining backward compatibility through legacy server backup.	2025-09-26 13:08:53 -06:00
Ryan Malloy	778ef3a2d4	Add chapter-based extraction for documents without bookmarks - Add chapter_name parameter to convert_to_markdown tool - Implement _find_chapter_content_range() for heading-based navigation - Add _get_available_headings() to help users find chapter names - Include chapter extraction metadata in results - Enhanced ultra-fast summary with available headings - Provides alternative to bookmark extraction when bookmarks unavailable	2025-08-22 08:14:23 -06:00
Ryan Malloy	6484036b69	📖 Add bookmark-based chapter extraction for precise content targeting - Add bookmark_name parameter for extracting specific chapters/sections - Implement bookmark boundary detection using Word XML structure - Extract content between bookmark start/end markers with smart extension - More reliable than page ranges - bookmarks are anchored to exact locations - Support chapter extraction like bookmark_name='Chapter1_Start' - Include bookmark metadata in response with element ranges - Perfect for extracting individual chapters from large documents	2025-08-22 08:02:50 -06:00
Ryan Malloy	b2033fc239	🔥 Fix critical issue: page_range was processing entire document - Replace unreliable Word page detection with element-based limiting - Cap extraction at 25 paragraphs per 'page' requested (max 100 total) - Cap extraction at 8k chars per 'page' requested (max 40k total) - Add early termination when limits reached - Add processing_limits metadata to show actual extraction stats - Prevent 1.28M token responses by stopping at reasonable content limits - Single page (page_range='1') now limited to ~25 paragraphs/8k chars	2025-08-22 08:00:02 -06:00
Ryan Malloy	431022e113	🚀 Add ultra-fast summary mode to prevent massive 1M+ token responses - Bypass all complex processing in summary_only mode - Extract only first 50 paragraphs, max 10 headings, 5 content paragraphs - Add bookmark detection for chapter navigation hints - Limit summary content to 2000 chars max - Prevent 1,282,370 token responses with surgical precision - Show bookmark names as chapter start indicators	2025-08-22 07:56:19 -06:00
Ryan Malloy	3dffce6904	⚡ Add aggressive content limiting to prevent MCP 25k token errors - Implement smart content truncation at ~80k chars (~20k tokens) - Preserve document structure when truncating (stop at natural breaks) - Add clear truncation notices with guidance for smaller ranges - Update chunking suggestions to use safer 8-page chunks - Enhance recommendations to suggest 3-8 page ranges - Prevent 29,869 > 25,000 token errors while maintaining usability	2025-08-21 02:50:04 -06:00
Ryan Malloy	9c2f299d49	📋 Add comprehensive Table of Contents extraction with smart chunking - Extract headings with page numbers during document processing - Generate optimized page ranges for each section/chapter - Provide intelligent chunking suggestions (15-page optimal chunks) - Classify section types (chapter, section, subsection, etc.) - Calculate actual section lengths based on heading positions - Include suggested_chunking with ready-to-use page ranges - Perfect for extracting 200+ page documents section by section	2025-08-21 02:47:01 -06:00
Ryan Malloy	d94bd39da6	🧠 Add intelligent processing recommendations for optimal workflow - Analyze document size and complexity before processing - Provide clear workflow recommendations in response metadata - Strongly recommend summary_only + page_range for large documents (>10 pages) - Add warning system for suboptimal usage patterns - Update parameter descriptions with best practice guidance - Help users avoid 25k token response limits proactively	2025-08-19 13:16:48 -06:00
Ryan Malloy	a485e05759	⚡ Implement true page-range filtering for efficient processing - Add page break detection using Word XML structure - Process only specified pages instead of full document + truncation - Route page-range requests to python-docx for granular control - Skip mammoth for page-specific processing (mammoth processes full doc) - Add page metadata to results when filtering is used - Significantly reduce memory usage and response size for large documents	2025-08-19 13:12:19 -06:00
Ryan Malloy	f884c99bbd	🎯 Add page-range chunking and summary mode for large documents - Replace character-based chunking with page-range support (e.g., '1-5', '1,3,5-10') - Add summary_only mode to prevent large response errors (>25k tokens) - Implement response size limiting with 5000 char truncation in summary mode - Support selective page processing for better memory efficiency - Maintain backward compatibility with existing parameters	2025-08-18 23:32:00 -06:00
Ryan Malloy	b3caed78d3	✨ Add comprehensive Markdown conversion with image support - Add convert_to_markdown tool for .docx/.doc files - Support multiple image handling modes (base64, files, references) - Implement large document chunking for performance - Preserve document structure (headings, lists, tables) - Smart fallback methods (mammoth → python-docx → custom) - Handle both modern and legacy Word formats	2025-08-18 23:23:59 -06:00
Ryan Malloy	1b359c4c7c	✨ Transform README into a stunning showcase - Add eye-catching visual design with emojis and badges - Create compelling hero section with value proposition - Include real-world benchmarks and performance metrics - Add enterprise success stories and use cases - Implement collapsible sections for better organization - Include Mermaid architecture diagram - Add comprehensive feature matrix with visual indicators - Create roadmap and community sections - Enhance installation and setup instructions - Make it GitHub-ready with proper formatting 🚀 Now ready to wow potential users and contributors!	2025-08-18 01:05:03 -06:00
Ryan Malloy	b681cb030b	Initial commit: MCP Office Tools v0.1.0 - Comprehensive Microsoft Office document processing server - Support for Word (.docx, .doc), Excel (.xlsx, .xls), PowerPoint (.pptx, .ppt), CSV - 6 universal tools: extract_text, extract_images, extract_metadata, detect_office_format, analyze_document_health, get_supported_formats - Multi-library fallback system for robust processing - URL support with intelligent caching - Legacy Office format support (97-2003) - FastMCP integration with async architecture - Production ready with comprehensive documentation 🤖 Generated with Claude Code (claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-08-18 01:01:48 -06:00

35 Commits