- Add bookmark_name parameter for extracting specific chapters/sections
- Implement bookmark boundary detection using Word XML structure
- Extract content between bookmark start/end markers with smart extension
- More reliable than page ranges - bookmarks are anchored to exact locations
- Support chapter extraction like bookmark_name='Chapter1_Start'
- Include bookmark metadata in response with element ranges
- Perfect for extracting individual chapters from large documents
- Replace unreliable Word page detection with element-based limiting
- Cap extraction at 25 paragraphs per 'page' requested (max 100 total)
- Cap extraction at 8k chars per 'page' requested (max 40k total)
- Add early termination when limits reached
- Add processing_limits metadata to show actual extraction stats
- Prevent 1.28M token responses by stopping at reasonable content limits
- Single page (page_range='1') now limited to ~25 paragraphs/8k chars
- Bypass all complex processing in summary_only mode
- Extract only first 50 paragraphs, max 10 headings, 5 content paragraphs
- Add bookmark detection for chapter navigation hints
- Limit summary content to 2000 chars max
- Prevent 1,282,370 token responses with surgical precision
- Show bookmark names as chapter start indicators
- Extract headings with page numbers during document processing
- Generate optimized page ranges for each section/chapter
- Provide intelligent chunking suggestions (15-page optimal chunks)
- Classify section types (chapter, section, subsection, etc.)
- Calculate actual section lengths based on heading positions
- Include suggested_chunking with ready-to-use page ranges
- Perfect for extracting 200+ page documents section by section
- Analyze document size and complexity before processing
- Provide clear workflow recommendations in response metadata
- Strongly recommend summary_only + page_range for large documents (>10 pages)
- Add warning system for suboptimal usage patterns
- Update parameter descriptions with best practice guidance
- Help users avoid 25k token response limits proactively
- Add page break detection using Word XML structure
- Process only specified pages instead of full document + truncation
- Route page-range requests to python-docx for granular control
- Skip mammoth for page-specific processing (mammoth processes full doc)
- Add page metadata to results when filtering is used
- Significantly reduce memory usage and response size for large documents
- Add eye-catching visual design with emojis and badges
- Create compelling hero section with value proposition
- Include real-world benchmarks and performance metrics
- Add enterprise success stories and use cases
- Implement collapsible sections for better organization
- Include Mermaid architecture diagram
- Add comprehensive feature matrix with visual indicators
- Create roadmap and community sections
- Enhance installation and setup instructions
- Make it GitHub-ready with proper formatting
🚀 Now ready to wow potential users and contributors!
- Comprehensive Microsoft Office document processing server
- Support for Word (.docx, .doc), Excel (.xlsx, .xls), PowerPoint (.pptx, .ppt), CSV
- 6 universal tools: extract_text, extract_images, extract_metadata, detect_office_format, analyze_document_health, get_supported_formats
- Multi-library fallback system for robust processing
- URL support with intelligent caching
- Legacy Office format support (97-2003)
- FastMCP integration with async architecture
- Production ready with comprehensive documentation
🤖 Generated with Claude Code (claude.ai/code)
Co-Authored-By: Claude <noreply@anthropic.com>