mcp-office-tools

Author	SHA1	Message	Date
Ryan Malloy	b2033fc239	🔥 Fix critical issue: page_range was processing entire document - Replace unreliable Word page detection with element-based limiting - Cap extraction at 25 paragraphs per 'page' requested (max 100 total) - Cap extraction at 8k chars per 'page' requested (max 40k total) - Add early termination when limits reached - Add processing_limits metadata to show actual extraction stats - Prevent 1.28M token responses by stopping at reasonable content limits - Single page (page_range='1') now limited to ~25 paragraphs/8k chars	2025-08-22 08:00:02 -06:00
Ryan Malloy	431022e113	🚀 Add ultra-fast summary mode to prevent massive 1M+ token responses - Bypass all complex processing in summary_only mode - Extract only first 50 paragraphs, max 10 headings, 5 content paragraphs - Add bookmark detection for chapter navigation hints - Limit summary content to 2000 chars max - Prevent 1,282,370 token responses with surgical precision - Show bookmark names as chapter start indicators	2025-08-22 07:56:19 -06:00
Ryan Malloy	3dffce6904	⚡ Add aggressive content limiting to prevent MCP 25k token errors - Implement smart content truncation at ~80k chars (~20k tokens) - Preserve document structure when truncating (stop at natural breaks) - Add clear truncation notices with guidance for smaller ranges - Update chunking suggestions to use safer 8-page chunks - Enhance recommendations to suggest 3-8 page ranges - Prevent 29,869 > 25,000 token errors while maintaining usability	2025-08-21 02:50:04 -06:00
Ryan Malloy	9c2f299d49	📋 Add comprehensive Table of Contents extraction with smart chunking - Extract headings with page numbers during document processing - Generate optimized page ranges for each section/chapter - Provide intelligent chunking suggestions (15-page optimal chunks) - Classify section types (chapter, section, subsection, etc.) - Calculate actual section lengths based on heading positions - Include suggested_chunking with ready-to-use page ranges - Perfect for extracting 200+ page documents section by section	2025-08-21 02:47:01 -06:00
Ryan Malloy	d94bd39da6	🧠 Add intelligent processing recommendations for optimal workflow - Analyze document size and complexity before processing - Provide clear workflow recommendations in response metadata - Strongly recommend summary_only + page_range for large documents (>10 pages) - Add warning system for suboptimal usage patterns - Update parameter descriptions with best practice guidance - Help users avoid 25k token response limits proactively	2025-08-19 13:16:48 -06:00
Ryan Malloy	a485e05759	⚡ Implement true page-range filtering for efficient processing - Add page break detection using Word XML structure - Process only specified pages instead of full document + truncation - Route page-range requests to python-docx for granular control - Skip mammoth for page-specific processing (mammoth processes full doc) - Add page metadata to results when filtering is used - Significantly reduce memory usage and response size for large documents	2025-08-19 13:12:19 -06:00
Ryan Malloy	f884c99bbd	🎯 Add page-range chunking and summary mode for large documents - Replace character-based chunking with page-range support (e.g., '1-5', '1,3,5-10') - Add summary_only mode to prevent large response errors (>25k tokens) - Implement response size limiting with 5000 char truncation in summary mode - Support selective page processing for better memory efficiency - Maintain backward compatibility with existing parameters	2025-08-18 23:32:00 -06:00
Ryan Malloy	b3caed78d3	✨ Add comprehensive Markdown conversion with image support - Add convert_to_markdown tool for .docx/.doc files - Support multiple image handling modes (base64, files, references) - Implement large document chunking for performance - Preserve document structure (headings, lists, tables) - Smart fallback methods (mammoth → python-docx → custom) - Handle both modern and legacy Word formats	2025-08-18 23:23:59 -06:00
Ryan Malloy	b681cb030b	Initial commit: MCP Office Tools v0.1.0 - Comprehensive Microsoft Office document processing server - Support for Word (.docx, .doc), Excel (.xlsx, .xls), PowerPoint (.pptx, .ppt), CSV - 6 universal tools: extract_text, extract_images, extract_metadata, detect_office_format, analyze_document_health, get_supported_formats - Multi-library fallback system for robust processing - URL support with intelligent caching - Legacy Office format support (97-2003) - FastMCP integration with async architecture - Production ready with comprehensive documentation 🤖 Generated with Claude Code (claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-08-18 01:01:48 -06:00

9 Commits