Update README and gitignore for new document tools

- Add 7 new Word tools to README (outline, search, entities, etc.) - Add 9 MCP prompts section with workflow descriptions - Gitignore reading progress bookmark files (.*.reading_progress.json) - Gitignore local .mcp.json and test documents
Add document navigation tools and MCP prompts
2026-01-11 07:41:49 -07:00 · 2026-01-11 07:23:15 -07:00 · 2026-01-11 07:15:43 -07:00 · 2026-01-11 06:47:39 -07:00 · 2026-01-11 05:20:05 -07:00 · 2026-01-11 05:08:18 -07:00
17 changed files with 3239 additions and 4590 deletions
--- a/.gitignore
+++ b/.gitignore
@ -78,3 +78,12 @@ tmp/
 # Temporary files created during processing
 *.tmp
 *.temp
+
+# Test documents (personal/private)
+ORIGINAL - The Other Side of the Bed*.docx
+
+# Reading progress bookmarks (user-specific)
+.*.reading_progress.json
+
+# Local MCP config
+.mcp.json
--- a/README.md
+++ b/README.md
@ -83,6 +83,13 @@ claude mcp add office-tools "uvx mcp-office-tools"
 | `convert_to_markdown` | Convert to Markdown with automatic pagination for large docs |
 | `extract_word_tables` | Extract tables as structured JSON, CSV, or Markdown |
 | `analyze_word_structure` | Analyze headings, sections, styles, and document hierarchy |
+| `get_document_outline` | Get structured outline with chapter detection and word counts |
+| `check_style_consistency` | Find formatting issues, missing chapters, style problems |
+| `search_document` | Search text with context and chapter location |
+| `extract_entities` | Extract people, places, organizations using pattern recognition |
+| `get_chapter_summaries` | Generate chapter previews with opening sentences |
+| `save_reading_progress` | Bookmark your reading position for later |
+| `get_reading_progress` | Resume reading from saved position |

 ### Excel Tools

@ -117,6 +124,24 @@ Here's what works and what's "good enough" — legacy formats from Office 97-200

 ---

+## 🎯 MCP Prompts
+
+Pre-built workflows that chain multiple tools together. Use these as starting points:
+
+| Prompt | Level | Description |
+|--------|-------|-------------|
+| `explore-document` | Basic | Start with any new document - get structure and identify issues |
+| `find-character` | Basic | Track all mentions of a person/character with context |
+| `chapter-preview` | Basic | Quick overview of each chapter without full read |
+| `resume-reading` | Intermediate | Check saved position and continue reading |
+| `document-analysis` | Intermediate | Comprehensive multi-tool analysis |
+| `character-journey` | Advanced | Track character arc through entire narrative |
+| `document-comparison` | Advanced | Compare entities and themes between chapters |
+| `full-reading-session` | Advanced | Guided reading with bookmarking |
+| `manuscript-review` | Advanced | Complete editorial workflow for editors |
+
+---
+
 ## 💡 Usage Examples

 ### Extract Text from Any Document
--- a/docs/DOCX_PROCESSING_FIXES.md
+++ b/docs/DOCX_PROCESSING_FIXES.md
@ -0,0 +1,181 @@
+# DOCX Processing Fixes
+
+This document captures critical bugs discovered and fixed while processing complex Word documents (specifically a 200+ page manuscript with 10 chapters).
+
+## Summary
+
+| # | Bug | Impact | Root Cause |
+|---|-----|--------|------------|
+| 1 | FastMCP banner corruption | MCP connection fails | ASCII art breaks JSON-RPC |
+| 2 | Page range cap | Wrong content extracted | Used max page# instead of count |
+| 3 | Heading scan limit | Chapters not found | Only scanned first 100 elements |
+| 4 | Short-text fallback logic | Chapters not found | `elif` prevented fallback |
+| 5 | **xpath API mismatch** | **Complete silent failure** | **python-docx != lxml API** |
+| 6 | Image mode default | Response too large | Base64 bloats output |
+
+---
+
+## 1. FastMCP Banner Corruption
+
+**File:** `src/mcp_office_tools/server.py`
+
+**Symptom:** MCP connection fails with `Invalid JSON: EOF while parsing`
+
+**Cause:** FastMCP's default startup banner prints ASCII art to stdout, corrupting the JSON-RPC protocol on stdio transport.
+
+**Fix:**
+```python
+def main():
+    # CRITICAL: show_banner=False is required for stdio transport!
+    # FastMCP's banner prints ASCII art to stdout which breaks JSON-RPC protocol
+    app.run(show_banner=False)
+```
+
+---
+
+## 2. Page Range Cap Bug
+
+**File:** `src/mcp_office_tools/utils/word_processing.py`
+
+**Symptom:** Requesting pages 1-5 returns truncated content, but pages 195-200 returns everything.
+
+**Cause:** The paragraph limit was calculated using the *maximum page number* instead of the *count of pages requested*.
+
+**Before:**
+```python
+max_paragraphs = max(page_numbers) * 50  # pages 1-5 = 250 max, pages 195-200 = 10,000 max!
+```
+
+**After:**
+```python
+num_pages_requested = len(page_numbers)  # pages 1-5 = 5, pages 195-200 = 6
+max_paragraphs = num_pages_requested * 300  # Generous limit per page
+max_chars = num_pages_requested * 50000
+```
+
+---
+
+## 3. Heading Scan Limit Bug
+
+**File:** `src/mcp_office_tools/utils/word_processing.py`
+
+**Symptom:** `_get_available_headings()` returns empty list for documents with chapters beyond the first few pages.
+
+**Cause:** The function only scanned the first 100 body elements, but Chapter 10 was at element 1524.
+
+**Before:**
+```python
+for element in doc.element.body[:100]:  # Only first 100 elements!
+    # find headings...
+```
+
+**After:**
+```python
+for element in doc.element.body:  # Scan ALL elements
+    if len(headings) >= 30:
+        break  # Limit output, not search
+    # find headings...
+```
+
+---
+
+## 4. Short-Text Fallback Logic Bug
+
+**File:** `src/mcp_office_tools/utils/word_processing.py`
+
+**Symptom:** Chapter search fails even when chapter text exists and is under 100 characters.
+
+**Cause:** The `elif` for short-text detection was attached to `if style_elem`, meaning it only ran when NO style existed. Paragraphs with any non-heading style (Normal, BodyText, etc.) skipped the fallback entirely.
+
+**Before:**
+```python
+if style_elem:
+    if 'heading' in style_val.lower():
+        chapter_start_idx = elem_idx
+        break
+elif len(text_content.strip()) < 100:  # Only runs if style_elem is empty!
+    chapter_start_idx = elem_idx
+    break
+```
+
+**After:**
+```python
+is_heading_style = False
+if style_elem:
+    style_val = style_elem[0].get(...)
+    is_heading_style = 'heading' in style_val.lower()
+
+# Independent check - runs regardless of whether style exists
+if is_heading_style or len(text_content.strip()) < 100:
+    chapter_start_idx = elem_idx
+    break
+```
+
+---
+
+## 5. Critical xpath API Mismatch (ROOT CAUSE)
+
+**File:** `src/mcp_office_tools/utils/word_processing.py`
+
+**Symptom:** Chapter search always returns "not found" even for chapters that clearly exist.
+
+**Cause:** python-docx wraps lxml elements with custom classes (`CT_Document`, `CT_Body`, `CT_P`) that override `xpath()` with a **different method signature**. Standard lxml accepts `xpath(expr, namespaces={...})`, but python-docx's version **rejects the `namespaces` keyword argument**.
+
+All 8 xpath calls were wrapped in try/except blocks, so they **silently failed** - the chapter search never actually executed.
+
+**Before (silently fails):**
+```python
+# These all throw: "BaseOxmlElement.xpath() got an unexpected keyword argument 'namespaces'"
+text_elems = para.xpath('.//w:t', namespaces={'w': 'http://...'})
+style_elem = para.xpath('.//w:pStyle', namespaces={'w': 'http://...'})
+```
+
+**After (works correctly):**
+```python
+from docx.oxml.ns import qn
+
+# Use findall() with qn() helper for text elements
+text_elems = para.findall('.//' + qn('w:t'))
+text_content = ''.join(t.text or '' for t in text_elems)
+
+# Use find() chain for nested elements (pStyle is inside pPr)
+pPr = para.find(qn('w:pPr'))
+if pPr is not None:
+    pStyle = pPr.find(qn('w:pStyle'))
+    if pStyle is not None:
+        style_val = pStyle.get(qn('w:val'), '')
+```
+
+**Key Insight:** The `qn()` function from `docx.oxml.ns` converts prefixed names like `'w:t'` to their fully qualified form `'{http://...}t'`, which works with python-docx's element methods.
+
+---
+
+## 6. Image Mode Default
+
+**File:** `src/mcp_office_tools/mixins/word.py`
+
+**Symptom:** Responses exceed token limits when documents contain images.
+
+**Cause:** Default `image_mode="base64"` embeds full image data inline, bloating responses.
+
+**Fix:**
+```python
+image_mode: str = Field(
+    default="files",  # Changed from "base64"
+    description="Image handling mode: 'files' (saves to disk), 'base64' (embeds inline), 'references' (metadata only)"
+)
+```
+
+---
+
+## Lessons Learned
+
+1. **Silent failures are dangerous.** Wrapping xpath calls in try/except hid the API mismatch for months. Consider logging exceptions even when swallowing them.
+
+2. **Test with real documents.** Unit tests with mocked data passed, but real documents exposed the xpath API issue immediately.
+
+3. **python-docx is not lxml.** Despite being built on lxml, python-docx's element classes have different method signatures. Always use `qn()` and `findall()`/`find()` instead of `xpath()` with namespace dicts.
+
+4. **Check your loop bounds.** Scanning "first 100 elements" seemed reasonable but failed for long documents. Limit the *output*, not the *search*.
+
+5. **Understand your conditionals.** The `if/elif` logic bug is subtle - the fallback was syntactically correct but semantically wrong for the use case.
--- a/reports/test_results.json
+++ b/reports/test_results.json
@ -1,154 +1,18 @@
 {
  "metadata": {
-    "start_time": "2026-01-11T00:28:31.202459",
-    "end_time": "2026-01-11T00:28:33.718606",
-    "duration": 1.2442383766174316,
-    "exit_status": 0,
+    "start_time": "2026-01-11T07:15:14.417108",
    "pytest_version": "9.0.2",
-    "test_types": [
-      "pytest",
-      "torture_test"
-    ]
+    "end_time": "2026-01-11T07:15:15.173732",
+    "duration": 0.7566196918487549,
+    "exit_status": 0
  },
  "summary": {
-    "total": 6,
-    "passed": 5,
+    "total": 0,
+    "passed": 0,
    "failed": 0,
-    "skipped": 1,
-    "pass_rate": 83.33333333333334
+    "skipped": 0,
+    "pass_rate": 0
  },
-  "categories": {
-    "Excel": {
-      "total": 4,
-      "passed": 3,
-      "failed": 0,
-      "skipped": 1
-    },
-    "Word": {
-      "total": 2,
-      "passed": 2,
-      "failed": 0,
-      "skipped": 0
-    }
-  },
-  "tests": [
-    {
-      "name": "Excel Data Analysis",
-      "nodeid": "torture_test.py::test_excel_data_analysis",
-      "category": "Excel",
-      "outcome": "passed",
-      "duration": 0.17873024940490723,
-      "timestamp": "2026-01-11T00:28:33.696485",
-      "module": "torture_test",
-      "class": null,
-      "function": "test_excel_data_analysis",
-      "inputs": {
-        "file": "test_files/test_data.xlsx"
-      },
-      "outputs": {
-        "sheets_analyzed": [
-          "Test Data"
-        ]
-      },
-      "error": null,
-      "traceback": null
-    },
-    {
-      "name": "Excel Formula Extraction",
-      "nodeid": "torture_test.py::test_excel_formula_extraction",
-      "category": "Excel",
-      "outcome": "passed",
-      "duration": 0.0032067298889160156,
-      "timestamp": "2026-01-11T00:28:33.699697",
-      "module": "torture_test",
-      "class": null,
-      "function": "test_excel_formula_extraction",
-      "inputs": {
-        "file": "test_files/test_data.xlsx"
-      },
-      "outputs": {
-        "total_formulas": 8
-      },
-      "error": null,
-      "traceback": null
-    },
-    {
-      "name": "Excel Chart Data Generation",
-      "nodeid": "torture_test.py::test_excel_chart_generation",
-      "category": "Excel",
-      "outcome": "passed",
-      "duration": 0.0025446414947509766,
-      "timestamp": "2026-01-11T00:28:33.702246",
-      "module": "torture_test",
-      "class": null,
-      "function": "test_excel_chart_generation",
-      "inputs": {
-        "file": "test_files/test_data.xlsx",
-        "x_column": "Category",
-        "y_columns": [
-          "Value"
-        ]
-      },
-      "outputs": {
-        "chart_libraries": 2
-      },
-      "error": null,
-      "traceback": null
-    },
-    {
-      "name": "Word Structure Analysis",
-      "nodeid": "torture_test.py::test_word_structure_analysis",
-      "category": "Word",
-      "outcome": "passed",
-      "duration": 0.010314226150512695,
-      "timestamp": "2026-01-11T00:28:33.712565",
-      "module": "torture_test",
-      "class": null,
-      "function": "test_word_structure_analysis",
-      "inputs": {
-        "file": "test_files/test_document.docx"
-      },
-      "outputs": {
-        "total_headings": 0
-      },
-      "error": null,
-      "traceback": null
-    },
-    {
-      "name": "Word Table Extraction",
-      "nodeid": "torture_test.py::test_word_table_extraction",
-      "category": "Word",
-      "outcome": "passed",
-      "duration": 0.005824089050292969,
-      "timestamp": "2026-01-11T00:28:33.718393",
-      "module": "torture_test",
-      "class": null,
-      "function": "test_word_table_extraction",
-      "inputs": {
-        "file": "test_files/test_document.docx"
-      },
-      "outputs": {
-        "total_tables": 0
-      },
-      "error": null,
-      "traceback": null
-    },
-    {
-      "name": "Real Excel File Analysis (FORScan)",
-      "nodeid": "torture_test.py::test_real_excel_analysis",
-      "category": "Excel",
-      "outcome": "skipped",
-      "duration": 0,
-      "timestamp": "2026-01-11T00:28:33.718405",
-      "module": "torture_test",
-      "class": null,
-      "function": "test_real_excel_analysis",
-      "inputs": {
-        "file": "/home/rpm/FORScan Lite spreadsheets v1.1/FORScan Lite spreadsheet - PIDs.xlsx"
-      },
-      "outputs": null,
-      "error": "File not found: /home/rpm/FORScan Lite spreadsheets v1.1/FORScan Lite spreadsheet - PIDs.xlsx",
-      "traceback": null
-    }
-  ]
+  "categories": {},
+  "tests": []
 }
--- a/src/mcp_office_tools/mixins/universal.py
+++ b/src/mcp_office_tools/mixins/universal.py
@ -293,7 +293,7 @@ class UniversalMixin(MCPMixin):
    async def _extract_text_by_category(self, file_path: str, extension: str, category: str, preserve_formatting: bool, method: str) -> dict[str, Any]:
        """Extract text based on document category."""
        # Import the appropriate extraction function
-        from ..server_monolithic import _extract_word_text, _extract_excel_text, _extract_powerpoint_text
+        from ..utils import _extract_word_text, _extract_excel_text, _extract_powerpoint_text

        if category == "word":
            return await _extract_word_text(file_path, extension, preserve_formatting, method)
@ -306,7 +306,7 @@ class UniversalMixin(MCPMixin):

    async def _extract_images_by_category(self, file_path: str, extension: str, category: str, output_format: str, min_width: int, min_height: int) -> list[dict[str, Any]]:
        """Extract images based on document category."""
-        from ..server_monolithic import _extract_word_images, _extract_excel_images, _extract_powerpoint_images
+        from ..utils import _extract_word_images, _extract_excel_images, _extract_powerpoint_images

        if category == "word":
            return await _extract_word_images(file_path, extension, output_format, min_width, min_height)
@ -319,7 +319,7 @@ class UniversalMixin(MCPMixin):

    async def _extract_metadata_by_category(self, file_path: str, extension: str, category: str) -> dict[str, Any]:
        """Extract metadata based on document category."""
-        from ..server_monolithic import _extract_word_metadata, _extract_excel_metadata, _extract_powerpoint_metadata, _extract_basic_metadata
+        from ..utils import _extract_word_metadata, _extract_excel_metadata, _extract_powerpoint_metadata, _extract_basic_metadata

        # Get basic metadata first
        metadata = await _extract_basic_metadata(file_path, extension, category)
@ -339,5 +339,5 @@ class UniversalMixin(MCPMixin):

    async def _extract_basic_metadata(self, file_path: str, extension: str, category: str) -> dict[str, Any]:
        """Extract basic metadata common to all documents."""
-        from ..server_monolithic import _extract_basic_metadata
+        from ..utils import _extract_basic_metadata
        return await _extract_basic_metadata(file_path, extension, category)
--- a/src/mcp_office_tools/mixins/word.py
+++ b/src/mcp_office_tools/mixins/word.py
@ -44,15 +44,15 @@ class WordMixin(MCPMixin):
    async def convert_to_markdown(
        self,
        file_path: str = Field(description="Path to Office document or URL"),
-        include_images: bool = Field(default=True, description="Include images in markdown with base64 encoding or file references"),
-        image_mode: str = Field(default="base64", description="Image handling mode: 'base64', 'files', or 'references'"),
-        max_image_size: int = Field(default=1024*1024, description="Maximum image size in bytes for base64 encoding"),
+        include_images: bool = Field(default=True, description="Include images in markdown output. When True, images are extracted to files and linked in the markdown."),
+        image_mode: str = Field(default="files", description="Image handling mode: 'files' (default, saves to disk and links), 'base64' (embeds inline - WARNING: can create massive responses), or 'references' (metadata only, no content)"),
+        max_image_size: int = Field(default=1024*1024, description="Maximum image size in bytes for base64 encoding (only used when image_mode='base64')"),
        preserve_structure: bool = Field(default=True, description="Preserve document structure (headings, lists, tables)"),
        page_range: str = Field(default="", description="Page range to convert (e.g., '1-5', '3', '1,3,5-10'). RECOMMENDED for large documents. Empty = all pages"),
        bookmark_name: str = Field(default="", description="Extract content for a specific bookmark/chapter (e.g., 'Chapter1_Start'). More reliable than page ranges."),
        chapter_name: str = Field(default="", description="Extract content for a chapter by heading text (e.g., 'Chapter 1', 'Introduction'). Works when bookmarks aren't available."),
        summary_only: bool = Field(default=False, description="Return only metadata and truncated summary. STRONGLY RECOMMENDED for large docs (>10 pages)"),
-        output_dir: str = Field(default="", description="Output directory for image files (if image_mode='files')"),
+        output_dir: str = Field(default="", description="Output directory for extracted image files. If empty, uses a temp directory based on document name."),
        # Pagination parameters
        limit: int = Field(default=50, description="Maximum number of document sections to return per page"),
        cursor_id: Optional[str] = Field(default=None, description="Cursor ID for pagination continuation"),
@ -225,17 +225,17 @@ class WordMixin(MCPMixin):
    # Helper methods - import from monolithic server
    async def _analyze_document_size(self, file_path: str, extension: str) -> dict[str, Any]:
        """Analyze document size for processing recommendations."""
-        from ..server_monolithic import _analyze_document_size
+        from ..utils import _analyze_document_size
        return await _analyze_document_size(file_path, extension)

    def _get_processing_recommendation(self, doc_analysis: dict[str, Any], page_range: str, summary_only: bool) -> dict[str, Any]:
        """Get processing recommendations based on document analysis."""
-        from ..server_monolithic import _get_processing_recommendation
+        from ..utils import _get_processing_recommendation
        return _get_processing_recommendation(doc_analysis, page_range, summary_only)

    def _parse_page_range(self, page_range: str) -> list[int]:
        """Parse page range string into list of page numbers."""
-        from ..server_monolithic import _parse_page_range
+        from ..utils import _parse_page_range
        return _parse_page_range(page_range)

    async def _convert_docx_to_markdown(
@ -244,7 +244,7 @@ class WordMixin(MCPMixin):
        bookmark_name: str = "", chapter_name: str = ""
    ) -> dict[str, Any]:
        """Convert .docx to markdown."""
-        from ..server_monolithic import _convert_docx_to_markdown
+        from ..utils import _convert_docx_to_markdown
        return await _convert_docx_to_markdown(
            file_path, include_images, image_mode, max_image_size,
            preserve_structure, page_numbers, summary_only, output_dir, bookmark_name, chapter_name
@ -255,7 +255,7 @@ class WordMixin(MCPMixin):
        preserve_structure: bool, page_numbers: list[int], summary_only: bool, output_dir: str
    ) -> dict[str, Any]:
        """Convert legacy .doc to markdown."""
-        from ..server_monolithic import _convert_doc_to_markdown
+        from ..utils import _convert_doc_to_markdown
        return await _convert_doc_to_markdown(
            file_path, include_images, image_mode, max_image_size,
            preserve_structure, page_numbers, summary_only, output_dir
@ -635,3 +635,802 @@ class WordMixin(MCPMixin):
            stack.append(node)

        return tree
+
+    # ==================== New Document Navigation Tools ====================
+
+    @mcp_tool(
+        name="get_document_outline",
+        description="Get a clean, structured outline of a Word document showing all headings, sections, and chapters with their locations. Perfect for understanding document structure before reading."
+    )
+    @handle_office_errors("Document outline")
+    async def get_document_outline(
+        self,
+        file_path: str = Field(description="Path to Word document or URL"),
+        include_word_counts: bool = Field(default=True, description="Include estimated word count per section"),
+        detect_chapters: bool = Field(default=True, description="Detect and flag chapter headings specifically")
+    ) -> dict[str, Any]:
+        """Extract structured document outline with chapter detection."""
+        from docx import Document
+        from docx.oxml.ns import qn
+
+        start_time = time.time()
+        local_path = await resolve_office_file_path(file_path)
+
+        validation = await validate_office_file(local_path)
+        if not validation["is_valid"]:
+            raise OfficeFileError(f"Invalid file: {', '.join(validation['errors'])}")
+
+        doc = Document(local_path)
+
+        outline = []
+        current_section = None
+        section_word_count = 0
+        total_words = 0
+        chapter_pattern = ["chapter", "section", "part", "introduction", "conclusion", "appendix", "preface", "epilogue"]
+
+        for para_idx, para in enumerate(doc.paragraphs):
+            text = para.text.strip()
+            word_count = len(text.split()) if text else 0
+            total_words += word_count
+
+            # Check if this is a heading
+            style_name = para.style.name.lower() if para.style else ""
+            is_heading = "heading" in style_name or "title" in style_name
+
+            # Determine heading level
+            level = 0
+            if is_heading:
+                if "title" in style_name:
+                    level = 0
+                elif "heading 1" in style_name or style_name == "heading1":
+                    level = 1
+                elif "heading 2" in style_name or style_name == "heading2":
+                    level = 2
+                elif "heading 3" in style_name or style_name == "heading3":
+                    level = 3
+                elif "heading" in style_name:
+                    # Try to extract number from style name
+                    import re
+                    match = re.search(r'heading\s*(\d+)', style_name)
+                    level = int(match.group(1)) if match else 4
+
+            if is_heading and text:
+                # Save previous section's word count
+                if current_section is not None and include_word_counts:
+                    current_section["word_count"] = section_word_count
+
+                # Detect if this is a chapter
+                is_chapter = False
+                chapter_number = None
+                if detect_chapters:
+                    text_lower = text.lower()
+                    for pattern in chapter_pattern:
+                        if pattern in text_lower:
+                            is_chapter = True
+                            # Try to extract chapter number
+                            import re
+                            match = re.search(r'(?:chapter|section|part)\s*(\d+)', text_lower)
+                            if match:
+                                chapter_number = int(match.group(1))
+                            break
+
+                current_section = {
+                    "text": text[:150] + ("..." if len(text) > 150 else ""),
+                    "level": level,
+                    "style": para.style.name if para.style else "Unknown",
+                    "paragraph_index": para_idx,
+                    "is_chapter": is_chapter
+                }
+
+                if chapter_number is not None:
+                    current_section["chapter_number"] = chapter_number
+
+                outline.append(current_section)
+                section_word_count = 0
+            else:
+                section_word_count += word_count
+
+        # Don't forget last section
+        if current_section is not None and include_word_counts:
+            current_section["word_count"] = section_word_count
+
+        # Build summary statistics
+        chapters = [item for item in outline if item.get("is_chapter")]
+        chapter_numbers = [c.get("chapter_number") for c in chapters if c.get("chapter_number")]
+
+        # Detect missing chapters
+        missing_chapters = []
+        if chapter_numbers:
+            expected = set(range(1, max(chapter_numbers) + 1))
+            found = set(chapter_numbers)
+            missing_chapters = sorted(expected - found)
+
+        return {
+            "outline": outline,
+            "summary": {
+                "total_headings": len(outline),
+                "chapters_found": len(chapters),
+                "chapter_numbers": chapter_numbers,
+                "missing_chapters": missing_chapters,
+                "total_words": total_words,
+                "total_paragraphs": len(doc.paragraphs)
+            },
+            "extraction_time": round(time.time() - start_time, 3)
+        }
+
+    @mcp_tool(
+        name="check_style_consistency",
+        description="Analyze a Word document for style inconsistencies, formatting issues, and potential problems like mismatched heading styles or missing chapters."
+    )
+    @handle_office_errors("Style consistency check")
+    async def check_style_consistency(
+        self,
+        file_path: str = Field(description="Path to Word document or URL")
+    ) -> dict[str, Any]:
+        """Check document for style and formatting consistency issues."""
+        from docx import Document
+
+        start_time = time.time()
+        local_path = await resolve_office_file_path(file_path)
+
+        validation = await validate_office_file(local_path)
+        if not validation["is_valid"]:
+            raise OfficeFileError(f"Invalid file: {', '.join(validation['errors'])}")
+
+        doc = Document(local_path)
+
+        issues = []
+        warnings = []
+
+        # Track heading styles and chapter detection
+        heading_styles = {}
+        chapters_by_style = {"heading": [], "other": []}
+        chapter_numbers_found = []
+
+        import re
+        chapter_pattern = re.compile(r'^chapter\s*(\d+)', re.IGNORECASE)
+
+        for para_idx, para in enumerate(doc.paragraphs):
+            text = para.text.strip()
+            style_name = para.style.name if para.style else "None"
+            style_lower = style_name.lower()
+
+            # Track style usage
+            heading_styles[style_name] = heading_styles.get(style_name, 0) + 1
+
+            # Check for chapter-like text
+            chapter_match = chapter_pattern.match(text)
+            if chapter_match:
+                chapter_num = int(chapter_match.group(1))
+                chapter_numbers_found.append(chapter_num)
+
+                is_heading_style = "heading" in style_lower
+
+                if is_heading_style:
+                    chapters_by_style["heading"].append({
+                        "chapter": chapter_num,
+                        "text": text[:80],
+                        "style": style_name,
+                        "paragraph": para_idx
+                    })
+                else:
+                    chapters_by_style["other"].append({
+                        "chapter": chapter_num,
+                        "text": text[:80],
+                        "style": style_name,
+                        "paragraph": para_idx
+                    })
+                    issues.append({
+                        "type": "inconsistent_chapter_style",
+                        "severity": "warning",
+                        "message": f"Chapter {chapter_num} uses '{style_name}' instead of a Heading style",
+                        "paragraph": para_idx,
+                        "text": text[:80]
+                    })
+
+            # Check for potential headings that aren't styled as headings
+            if text and len(text) < 100 and not text.endswith('.'):
+                is_heading_style = "heading" in style_lower or "title" in style_lower
+                looks_like_heading = any(word in text.lower() for word in
+                    ["chapter", "section", "part", "introduction", "conclusion", "appendix"])
+
+                if looks_like_heading and not is_heading_style:
+                    warnings.append({
+                        "type": "potential_heading_not_styled",
+                        "message": f"Text looks like a heading but uses '{style_name}' style",
+                        "paragraph": para_idx,
+                        "text": text[:80]
+                    })
+
+        # Check for missing chapters in sequence
+        missing_chapters = []
+        if chapter_numbers_found:
+            chapter_numbers_found.sort()
+            expected = set(range(1, max(chapter_numbers_found) + 1))
+            found = set(chapter_numbers_found)
+            missing_chapters = sorted(expected - found)
+
+            for missing in missing_chapters:
+                issues.append({
+                    "type": "missing_chapter",
+                    "severity": "error",
+                    "message": f"Chapter {missing} appears to be missing from sequence",
+                    "expected_between": f"Chapter {missing-1} and Chapter {missing+1}" if missing > 1 else f"Before Chapter {missing+1}"
+                })
+
+        # Check for duplicate chapter numbers
+        from collections import Counter
+        chapter_counts = Counter(chapter_numbers_found)
+        duplicates = {num: count for num, count in chapter_counts.items() if count > 1}
+        for chapter_num, count in duplicates.items():
+            issues.append({
+                "type": "duplicate_chapter",
+                "severity": "warning",
+                "message": f"Chapter {chapter_num} appears {count} times"
+            })
+
+        # Summary of heading style usage
+        heading_summary = {k: v for k, v in heading_styles.items()
+                         if "heading" in k.lower() or "title" in k.lower()}
+
+        return {
+            "issues": issues,
+            "warnings": warnings,
+            "chapter_analysis": {
+                "total_chapters": len(chapter_numbers_found),
+                "chapters_with_heading_style": len(chapters_by_style["heading"]),
+                "chapters_without_heading_style": len(chapters_by_style["other"]),
+                "missing_chapters": missing_chapters,
+                "duplicate_chapters": list(duplicates.keys()),
+                "chapter_details": chapters_by_style
+            },
+            "style_usage": heading_summary,
+            "health_score": self._calculate_doc_health_score(issues, warnings),
+            "analysis_time": round(time.time() - start_time, 3)
+        }
+
+    def _calculate_doc_health_score(self, issues: list, warnings: list) -> dict:
+        """Calculate document health score based on issues found."""
+        score = 100
+
+        for issue in issues:
+            if issue.get("severity") == "error":
+                score -= 10
+            elif issue.get("severity") == "warning":
+                score -= 5
+
+        for _ in warnings:
+            score -= 2
+
+        score = max(0, min(100, score))
+
+        if score >= 90:
+            rating = "excellent"
+        elif score >= 70:
+            rating = "good"
+        elif score >= 50:
+            rating = "fair"
+        else:
+            rating = "needs attention"
+
+        return {"score": score, "rating": rating}
+
+    @mcp_tool(
+        name="search_document",
+        description="Search for text within a Word document and return matches with surrounding context and location information."
+    )
+    @handle_office_errors("Document search")
+    async def search_document(
+        self,
+        file_path: str = Field(description="Path to Word document or URL"),
+        query: str = Field(description="Text to search for (case-insensitive)"),
+        context_chars: int = Field(default=100, description="Number of characters of context before and after match"),
+        max_results: int = Field(default=20, description="Maximum number of results to return")
+    ) -> dict[str, Any]:
+        """Search document for text with context."""
+        from docx import Document
+
+        start_time = time.time()
+        local_path = await resolve_office_file_path(file_path)
+
+        validation = await validate_office_file(local_path)
+        if not validation["is_valid"]:
+            raise OfficeFileError(f"Invalid file: {', '.join(validation['errors'])}")
+
+        doc = Document(local_path)
+        query_lower = query.lower()
+
+        results = []
+        current_chapter = None
+        current_section = None
+
+        for para_idx, para in enumerate(doc.paragraphs):
+            text = para.text
+            style_name = para.style.name if para.style else ""
+            style_lower = style_name.lower()
+
+            # Track current chapter/section for context
+            if "heading" in style_lower or "title" in style_lower:
+                if "1" in style_name or "title" in style_lower:
+                    current_chapter = text.strip()[:80]
+                    current_section = None
+                else:
+                    current_section = text.strip()[:80]
+
+            # Search for matches
+            text_lower = text.lower()
+            search_start = 0
+
+            while True:
+                pos = text_lower.find(query_lower, search_start)
+                if pos == -1:
+                    break
+
+                if len(results) >= max_results:
+                    break
+
+                # Extract context
+                context_start = max(0, pos - context_chars)
+                context_end = min(len(text), pos + len(query) + context_chars)
+
+                context = text[context_start:context_end]
+                if context_start > 0:
+                    context = "..." + context
+                if context_end < len(text):
+                    context = context + "..."
+
+                results.append({
+                    "paragraph_index": para_idx,
+                    "position": pos,
+                    "context": context,
+                    "chapter": current_chapter,
+                    "section": current_section,
+                    "style": style_name
+                })
+
+                search_start = pos + 1
+
+            if len(results) >= max_results:
+                break
+
+        return {
+            "query": query,
+            "total_matches": len(results),
+            "results": results,
+            "search_time": round(time.time() - start_time, 3),
+            "truncated": len(results) >= max_results
+        }
+
+    @mcp_tool(
+        name="extract_entities",
+        description="Extract named entities (people, places, organizations) from a Word document using pattern-based recognition. Great for identifying key characters, locations, and institutions mentioned in the text."
+    )
+    @handle_office_errors("Entity extraction")
+    async def extract_entities(
+        self,
+        file_path: str = Field(description="Path to Word document or URL"),
+        entity_types: str = Field(default="all", description="Entity types to extract: 'all', 'people', 'places', 'organizations', or comma-separated combination"),
+        min_occurrences: int = Field(default=1, description="Minimum occurrences for an entity to be included"),
+        include_context: bool = Field(default=True, description="Include sample context for each entity")
+    ) -> dict[str, Any]:
+        """Extract named entities from document using pattern-based recognition."""
+        from docx import Document
+        from collections import defaultdict
+        import re
+
+        start_time = time.time()
+        local_path = await resolve_office_file_path(file_path)
+
+        validation = await validate_office_file(local_path)
+        if not validation["is_valid"]:
+            raise OfficeFileError(f"Invalid file: {', '.join(validation['errors'])}")
+
+        doc = Document(local_path)
+
+        # Parse entity types to extract
+        if entity_types == "all":
+            extract_types = {"people", "places", "organizations"}
+        else:
+            extract_types = set(t.strip().lower() for t in entity_types.split(","))
+
+        # Entity containers with context tracking
+        entities = {
+            "people": defaultdict(lambda: {"count": 0, "contexts": []}),
+            "places": defaultdict(lambda: {"count": 0, "contexts": []}),
+            "organizations": defaultdict(lambda: {"count": 0, "contexts": []})
+        }
+
+        # Patterns for entity detection
+        # Titles indicating people
+        title_pattern = re.compile(
+            r'\b(Dr\.?|Mr\.?|Mrs\.?|Ms\.?|Miss|Professor|Prof\.?|Sister|Father|Rev\.?|'
+            r'President|Director|Nurse|RN|LPN|MD)\s+([A-Z][a-z]+(?:\s+[A-Z][a-z]+)?)',
+            re.IGNORECASE
+        )
+
+        # Organization patterns
+        org_suffixes = re.compile(
+            r'\b([A-Z][a-zA-Z\s\'\-]+(?:Hospital|Medical Center|Center|Clinic|University|'
+            r'College|School|Association|Institute|Foundation|Department|Administration|'
+            r'Committee|Board|Agency|Service|Company|Inc|Corp|LLC|VA|ANA))\b'
+        )
+
+        # Place patterns (cities, states, geographic locations)
+        place_patterns = re.compile(
+            r'\b([A-Z][a-z]+(?:\s+[A-Z][a-z]+)*),\s*((?:[A-Z]{2}|[A-Z][a-z]+(?:\s+[A-Z][a-z]+)*))\b|'
+            r'\b((?:North|South|East|West)\s+[A-Z][a-z]+)\b|'
+            r'\b([A-Z][a-z]+(?:\s+[A-Z][a-z]+)*)\s+(?:City|County|State|Valley|Mountain|River|Lake|Island)\b'
+        )
+
+        # Known US states for validation
+        us_states = {
+            'Alabama', 'Alaska', 'Arizona', 'Arkansas', 'California', 'Colorado',
+            'Connecticut', 'Delaware', 'Florida', 'Georgia', 'Hawaii', 'Idaho',
+            'Illinois', 'Indiana', 'Iowa', 'Kansas', 'Kentucky', 'Louisiana',
+            'Maine', 'Maryland', 'Massachusetts', 'Michigan', 'Minnesota',
+            'Mississippi', 'Missouri', 'Montana', 'Nebraska', 'Nevada',
+            'New Hampshire', 'New Jersey', 'New Mexico', 'New York',
+            'North Carolina', 'North Dakota', 'Ohio', 'Oklahoma', 'Oregon',
+            'Pennsylvania', 'Rhode Island', 'South Carolina', 'South Dakota',
+            'Tennessee', 'Texas', 'Utah', 'Vermont', 'Virginia', 'Washington',
+            'West Virginia', 'Wisconsin', 'Wyoming', 'DC', 'ID', 'WA', 'NY',
+            'CA', 'ND', 'MN', 'IA', 'MT', 'OR', 'NV', 'AZ', 'NM', 'CO', 'WY'
+        }
+
+        # Common first names for better people detection
+        common_titles = {'dr', 'mr', 'mrs', 'ms', 'miss', 'professor', 'prof',
+                        'sister', 'father', 'rev', 'president', 'director', 'nurse'}
+
+        current_chapter = "Document Start"
+
+        for para_idx, para in enumerate(doc.paragraphs):
+            text = para.text
+            style_name = para.style.name if para.style else ""
+
+            # Track chapters for context
+            if "heading" in style_name.lower() and "1" in style_name:
+                current_chapter = text.strip()[:60]
+
+            # Skip very short paragraphs
+            if len(text) < 10:
+                continue
+
+            # Extract people
+            if "people" in extract_types:
+                for match in title_pattern.finditer(text):
+                    title = match.group(1)
+                    name = match.group(2).strip()
+                    full_name = f"{title} {name}".strip()
+
+                    # Clean up the name
+                    if len(name) >= 2:
+                        entities["people"][full_name]["count"] += 1
+                        if include_context and len(entities["people"][full_name]["contexts"]) < 3:
+                            # Get surrounding context
+                            start = max(0, match.start() - 30)
+                            end = min(len(text), match.end() + 50)
+                            context = text[start:end].strip()
+                            entities["people"][full_name]["contexts"].append({
+                                "text": f"...{context}...",
+                                "chapter": current_chapter,
+                                "paragraph": para_idx
+                            })
+
+                # Also look for standalone capitalized names after verbs
+                name_after_verb = re.finditer(
+                    r'\b(?:said|told|asked|replied|answered|explained|noted|added|mentioned)\s+'
+                    r'([A-Z][a-z]+(?:\s+[A-Z][a-z]+)?)\b',
+                    text
+                )
+                for match in name_after_verb:
+                    name = match.group(1).strip()
+                    if len(name) >= 3 and name not in us_states:
+                        entities["people"][name]["count"] += 1
+                        if include_context and len(entities["people"][name]["contexts"]) < 3:
+                            start = max(0, match.start() - 20)
+                            end = min(len(text), match.end() + 40)
+                            context = text[start:end].strip()
+                            entities["people"][name]["contexts"].append({
+                                "text": f"...{context}...",
+                                "chapter": current_chapter,
+                                "paragraph": para_idx
+                            })
+
+            # Extract organizations
+            if "organizations" in extract_types:
+                for match in org_suffixes.finditer(text):
+                    org_name = match.group(1).strip()
+                    if len(org_name) >= 5:
+                        entities["organizations"][org_name]["count"] += 1
+                        if include_context and len(entities["organizations"][org_name]["contexts"]) < 3:
+                            start = max(0, match.start() - 20)
+                            end = min(len(text), match.end() + 40)
+                            context = text[start:end].strip()
+                            entities["organizations"][org_name]["contexts"].append({
+                                "text": f"...{context}...",
+                                "chapter": current_chapter,
+                                "paragraph": para_idx
+                            })
+
+            # Extract places
+            if "places" in extract_types:
+                for match in place_patterns.finditer(text):
+                    # Try different capture groups
+                    place = None
+                    if match.group(1) and match.group(2):  # City, State pattern
+                        city = match.group(1).strip()
+                        state = match.group(2).strip()
+                        if state in us_states or len(state) == 2:
+                            place = f"{city}, {state}"
+                    elif match.group(3):  # Directional places
+                        place = match.group(3).strip()
+                    elif match.group(4):  # Geographic features
+                        place = match.group(4).strip()
+
+                    if place and len(place) >= 3:
+                        entities["places"][place]["count"] += 1
+                        if include_context and len(entities["places"][place]["contexts"]) < 3:
+                            start = max(0, match.start() - 20)
+                            end = min(len(text), match.end() + 40)
+                            context = text[start:end].strip()
+                            entities["places"][place]["contexts"].append({
+                                "text": f"...{context}...",
+                                "chapter": current_chapter,
+                                "paragraph": para_idx
+                            })
+
+        # Filter by minimum occurrences and prepare output
+        def filter_and_sort(entity_dict, min_count):
+            filtered = []
+            for name, data in entity_dict.items():
+                if data["count"] >= min_count:
+                    entry = {
+                        "name": name,
+                        "occurrences": data["count"]
+                    }
+                    if include_context and data["contexts"]:
+                        entry["sample_contexts"] = data["contexts"]
+                    filtered.append(entry)
+            return sorted(filtered, key=lambda x: x["occurrences"], reverse=True)
+
+        result = {
+            "entities": {},
+            "summary": {
+                "total_entities": 0,
+                "by_type": {}
+            },
+            "extraction_time": round(time.time() - start_time, 3)
+        }
+
+        for entity_type in extract_types:
+            if entity_type in entities:
+                filtered = filter_and_sort(entities[entity_type], min_occurrences)
+                result["entities"][entity_type] = filtered
+                result["summary"]["by_type"][entity_type] = len(filtered)
+                result["summary"]["total_entities"] += len(filtered)
+
+        return result
+
+    @mcp_tool(
+        name="get_chapter_summaries",
+        description="Get brief summaries/previews of each chapter in a Word document. Extracts the opening sentences of each chapter to give a quick overview of content."
+    )
+    @handle_office_errors("Chapter summaries")
+    async def get_chapter_summaries(
+        self,
+        file_path: str = Field(description="Path to Word document or URL"),
+        sentences_per_chapter: int = Field(default=3, description="Number of opening sentences to include per chapter"),
+        include_word_counts: bool = Field(default=True, description="Include word count for each chapter")
+    ) -> dict[str, Any]:
+        """Extract chapter summaries/previews from document."""
+        from docx import Document
+        import re
+
+        start_time = time.time()
+        local_path = await resolve_office_file_path(file_path)
+
+        validation = await validate_office_file(local_path)
+        if not validation["is_valid"]:
+            raise OfficeFileError(f"Invalid file: {', '.join(validation['errors'])}")
+
+        doc = Document(local_path)
+
+        chapters = []
+        current_chapter = None
+        chapter_text = []
+        chapter_word_count = 0
+        chapter_pattern = re.compile(r'^chapter\s*(\d+)', re.IGNORECASE)
+
+        def extract_preview(text_paragraphs, num_sentences):
+            """Extract first N sentences from collected paragraphs."""
+            full_text = " ".join(text_paragraphs)
+            # Simple sentence splitting
+            sentences = re.split(r'(?<=[.!?])\s+', full_text)
+            preview_sentences = sentences[:num_sentences]
+            return " ".join(preview_sentences).strip()
+
+        def save_current_chapter():
+            """Save the current chapter's data."""
+            nonlocal current_chapter, chapter_text, chapter_word_count
+            if current_chapter:
+                preview = extract_preview(chapter_text, sentences_per_chapter)
+                chapter_data = {
+                    "chapter_number": current_chapter["number"],
+                    "title": current_chapter["title"],
+                    "paragraph_index": current_chapter["paragraph_index"],
+                    "preview": preview if preview else "(No text content found)",
+                }
+                if include_word_counts:
+                    chapter_data["word_count"] = chapter_word_count
+                chapters.append(chapter_data)
+
+        for para_idx, para in enumerate(doc.paragraphs):
+            text = para.text.strip()
+            style_name = para.style.name if para.style else ""
+
+            # Check if this is a chapter heading
+            chapter_match = chapter_pattern.match(text)
+            if chapter_match:
+                # Save previous chapter first
+                save_current_chapter()
+
+                # Start new chapter
+                current_chapter = {
+                    "number": int(chapter_match.group(1)),
+                    "title": text[:100],
+                    "paragraph_index": para_idx
+                }
+                chapter_text = []
+                chapter_word_count = 0
+            elif current_chapter:
+                # Accumulate text for current chapter
+                if text:
+                    word_count = len(text.split())
+                    chapter_word_count += word_count
+                    # Only collect first portion of text for preview
+                    if len(" ".join(chapter_text)) < 1000:
+                        chapter_text.append(text)
+
+        # Don't forget the last chapter
+        save_current_chapter()
+
+        # Calculate statistics
+        total_words = sum(c.get("word_count", 0) for c in chapters)
+        avg_words = total_words // len(chapters) if chapters else 0
+
+        return {
+            "chapters": chapters,
+            "summary": {
+                "total_chapters": len(chapters),
+                "total_words": total_words,
+                "average_words_per_chapter": avg_words,
+                "shortest_chapter": min((c for c in chapters), key=lambda x: x.get("word_count", 0), default=None),
+                "longest_chapter": max((c for c in chapters), key=lambda x: x.get("word_count", 0), default=None)
+            },
+            "extraction_time": round(time.time() - start_time, 3)
+        }
+
+    @mcp_tool(
+        name="save_reading_progress",
+        description="Save your reading progress in a Word document. Creates a bookmark file to track which chapter/paragraph you're on, so you can resume reading later."
+    )
+    @handle_office_errors("Save reading progress")
+    async def save_reading_progress(
+        self,
+        file_path: str = Field(description="Path to Word document"),
+        chapter_number: int = Field(default=1, description="Current chapter number"),
+        paragraph_index: int = Field(default=0, description="Current paragraph index"),
+        notes: str = Field(default="", description="Optional notes about where you left off")
+    ) -> dict[str, Any]:
+        """Save reading progress to a bookmark file."""
+        import json
+        from datetime import datetime
+
+        local_path = await resolve_office_file_path(file_path)
+
+        validation = await validate_office_file(local_path)
+        if not validation["is_valid"]:
+            raise OfficeFileError(f"Invalid file: {', '.join(validation['errors'])}")
+
+        # Create bookmark file path (same location as document)
+        doc_dir = os.path.dirname(local_path)
+        doc_name = os.path.splitext(os.path.basename(local_path))[0]
+        bookmark_path = os.path.join(doc_dir, f".{doc_name}.reading_progress.json")
+
+        # Load existing bookmarks or create new
+        bookmarks = {"history": []}
+        if os.path.exists(bookmark_path):
+            try:
+                with open(bookmark_path, 'r') as f:
+                    bookmarks = json.load(f)
+            except (json.JSONDecodeError, IOError):
+                bookmarks = {"history": []}
+
+        # Create new bookmark entry
+        bookmark = {
+            "timestamp": datetime.now().isoformat(),
+            "chapter": chapter_number,
+            "paragraph_index": paragraph_index,
+            "notes": notes
+        }
+
+        # Update current position and add to history
+        bookmarks["current"] = bookmark
+        bookmarks["document"] = os.path.basename(local_path)
+        bookmarks["history"].append(bookmark)
+
+        # Keep only last 50 history entries
+        if len(bookmarks["history"]) > 50:
+            bookmarks["history"] = bookmarks["history"][-50:]
+
+        # Save bookmark file
+        with open(bookmark_path, 'w') as f:
+            json.dump(bookmarks, f, indent=2)
+
+        return {
+            "saved": True,
+            "bookmark_file": bookmark_path,
+            "position": {
+                "chapter": chapter_number,
+                "paragraph_index": paragraph_index
+            },
+            "notes": notes,
+            "timestamp": bookmark["timestamp"],
+            "history_entries": len(bookmarks["history"])
+        }
+
+    @mcp_tool(
+        name="get_reading_progress",
+        description="Retrieve your saved reading progress for a Word document. Shows where you left off and your reading history."
+    )
+    @handle_office_errors("Get reading progress")
+    async def get_reading_progress(
+        self,
+        file_path: str = Field(description="Path to Word document")
+    ) -> dict[str, Any]:
+        """Retrieve saved reading progress from bookmark file."""
+        import json
+
+        local_path = await resolve_office_file_path(file_path)
+
+        validation = await validate_office_file(local_path)
+        if not validation["is_valid"]:
+            raise OfficeFileError(f"Invalid file: {', '.join(validation['errors'])}")
+
+        # Find bookmark file
+        doc_dir = os.path.dirname(local_path)
+        doc_name = os.path.splitext(os.path.basename(local_path))[0]
+        bookmark_path = os.path.join(doc_dir, f".{doc_name}.reading_progress.json")
+
+        if not os.path.exists(bookmark_path):
+            return {
+                "has_progress": False,
+                "message": "No reading progress saved for this document. Use save_reading_progress to save your position."
+            }
+
+        # Load bookmarks
+        try:
+            with open(bookmark_path, 'r') as f:
+                bookmarks = json.load(f)
+        except (json.JSONDecodeError, IOError) as e:
+            return {
+                "has_progress": False,
+                "error": f"Could not read bookmark file: {str(e)}"
+            }
+
+        current = bookmarks.get("current", {})
+        history = bookmarks.get("history", [])
+
+        return {
+            "has_progress": True,
+            "document": bookmarks.get("document", os.path.basename(local_path)),
+            "current_position": {
+                "chapter": current.get("chapter"),
+                "paragraph_index": current.get("paragraph_index"),
+                "notes": current.get("notes", ""),
+                "last_read": current.get("timestamp")
+            },
+            "reading_sessions": len(history),
+            "recent_history": history[-5:] if history else [],
+            "bookmark_file": bookmark_path
+        }
--- a/src/mcp_office_tools/server.py
+++ b/src/mcp_office_tools/server.py
@ -14,6 +14,7 @@ import os
 import tempfile

 from fastmcp import FastMCP
+from fastmcp.prompts import Prompt

 from .mixins import UniversalMixin, WordMixin, ExcelMixin, PowerPointMixin

@ -39,14 +40,252 @@ powerpoint_mixin.register_all(app, prefix="")
 # Note: All helper functions are still available from server_legacy.py for import by mixins
 # This allows gradual migration while maintaining backward compatibility

+
+# ==================== MCP Prompts ====================
+# Prompts help users understand how to use tools effectively
+# Organized from basic to advanced multi-step workflows
+
+@app.prompt(
+    name="explore-document",
+    description="Basic: Start exploring a new document - get structure, identify key content"
+)
+def prompt_explore_document(file_path: str = "") -> list:
+    """Guide for exploring a new Word document."""
+    path_hint = f"the document at `{file_path}`" if file_path else "your document"
+    return [
+        {
+            "role": "user",
+            "content": f"""I want to explore {path_hint}. Please help me understand it by:
+
+1. First, use `get_document_outline` to show me the document structure (chapters, sections, headings)
+2. Then use `check_style_consistency` to identify any formatting issues or problems
+3. Finally, give me a summary of what the document contains based on the outline
+
+This will help me understand what I'm working with before diving into the content."""
+        }
+    ]
+
+
+@app.prompt(
+    name="find-character",
+    description="Basic: Find all mentions of a person/character in a document"
+)
+def prompt_find_character(file_path: str = "", character_name: str = "") -> list:
+    """Guide for finding character mentions."""
+    path_hint = f"in `{file_path}`" if file_path else "in my document"
+    name_hint = f'"{character_name}"' if character_name else "a character"
+    return [
+        {
+            "role": "user",
+            "content": f"""Help me find all mentions of {name_hint} {path_hint}.
+
+Use `search_document` to find occurrences with context. I want to see:
+- Each mention with surrounding text
+- Which chapter each mention appears in
+- A count of total appearances
+
+This will help me track the character's journey through the narrative."""
+        }
+    ]
+
+
+@app.prompt(
+    name="chapter-preview",
+    description="Basic: Get a quick preview of each chapter without reading the full content"
+)
+def prompt_chapter_preview(file_path: str = "") -> list:
+    """Guide for getting chapter previews."""
+    path_hint = f"from `{file_path}`" if file_path else ""
+    return [
+        {
+            "role": "user",
+            "content": f"""I want a quick preview of each chapter {path_hint}.
+
+Use `get_chapter_summaries` with 3-4 sentences per chapter to give me a preview of what each chapter covers. Include word counts so I know which chapters are longest.
+
+This gives me a roadmap before I start reading in depth."""
+        }
+    ]
+
+
+@app.prompt(
+    name="resume-reading",
+    description="Intermediate: Check where you left off and continue reading"
+)
+def prompt_resume_reading(file_path: str = "") -> list:
+    """Guide for resuming reading."""
+    path_hint = f"in `{file_path}`" if file_path else ""
+    return [
+        {
+            "role": "user",
+            "content": f"""I want to continue reading where I left off {path_hint}.
+
+1. First, use `get_reading_progress` to see where I was
+2. Then use `convert_to_markdown` with `chapter_name` set to that chapter to show me the content
+3. When I tell you where to stop, use `save_reading_progress` to bookmark my position
+
+This is my reading workflow for long documents."""
+        }
+    ]
+
+
+@app.prompt(
+    name="document-analysis",
+    description="Intermediate: Comprehensive analysis - structure, entities, and key information"
+)
+def prompt_document_analysis(file_path: str = "") -> list:
+    """Guide for comprehensive document analysis."""
+    path_hint = f"the document `{file_path}`" if file_path else "my document"
+    return [
+        {
+            "role": "user",
+            "content": f"""Perform a comprehensive analysis of {path_hint}:
+
+1. **Structure Analysis** (`get_document_outline`): Map out all chapters, sections, and headings
+2. **Quality Check** (`check_style_consistency`): Identify any formatting issues
+3. **Entity Extraction** (`extract_entities`): Find all people, places, and organizations mentioned
+4. **Chapter Overview** (`get_chapter_summaries`): Generate previews of each chapter
+
+Summarize the findings in a report format. This gives me a complete picture of the document."""
+        }
+    ]
+
+
+@app.prompt(
+    name="character-journey",
+    description="Advanced: Track a character's complete journey through a document"
+)
+def prompt_character_journey(file_path: str = "", character_name: str = "") -> list:
+    """Guide for tracking a character's journey."""
+    path_hint = f"in `{file_path}`" if file_path else ""
+    name_hint = f'"{character_name}"' if character_name else "the main character"
+    return [
+        {
+            "role": "user",
+            "content": f"""Help me track {name_hint}'s complete journey {path_hint}:
+
+**Step 1 - Get Context**
+Use `get_document_outline` to understand the chapter structure
+
+**Step 2 - Find All Mentions**
+Use `search_document` to find every mention of the character with context
+
+**Step 3 - Analyze by Chapter**
+For each chapter where the character appears, use `convert_to_markdown` with `chapter_name` to extract the relevant sections
+
+**Step 4 - Summarize the Journey**
+Create a timeline or narrative summary of the character's arc through the story
+
+This multi-step workflow helps me understand a character's complete narrative arc."""
+        }
+    ]
+
+
+@app.prompt(
+    name="document-comparison",
+    description="Advanced: Compare entities and themes between chapters or sections"
+)
+def prompt_document_comparison(file_path: str = "") -> list:
+    """Guide for comparing document sections."""
+    path_hint = f"from `{file_path}`" if file_path else ""
+    return [
+        {
+            "role": "user",
+            "content": f"""Help me compare different sections of the document {path_hint}:
+
+**Step 1 - Get Structure**
+Use `get_document_outline` to identify all chapters/sections
+
+**Step 2 - Extract Entities by Section**
+Use `extract_entities` with different chapters to see which characters/places appear where
+
+**Step 3 - Get Chapter Summaries**
+Use `get_chapter_summaries` to understand the focus of each section
+
+**Step 4 - Compare and Contrast**
+Based on the data, identify:
+- Which characters appear in which chapters
+- How locations shift through the narrative
+- Patterns in entity distribution
+
+Create a comparison matrix or analysis."""
+        }
+    ]
+
+
+@app.prompt(
+    name="full-reading-session",
+    description="Advanced: Complete guided reading session with bookmarking"
+)
+def prompt_full_reading_session(file_path: str = "", start_chapter: int = 1) -> list:
+    """Guide for a complete reading session."""
+    path_hint = f"of `{file_path}`" if file_path else ""
+    return [
+        {
+            "role": "user",
+            "content": f"""Let's do a guided reading session {path_hint}:
+
+**Setup Phase**
+1. Use `get_reading_progress` to check if I have a saved position
+2. Use `get_document_outline` to show the chapter list
+3. Use `check_style_consistency` to flag any document issues
+
+**Reading Phase**
+4. Use `convert_to_markdown` with `chapter_name="Chapter {start_chapter}"` to show that chapter
+5. When I'm done, I'll say "stop at paragraph X" and you use `save_reading_progress`
+
+**Analysis Phase (Optional)**
+6. Use `extract_entities` with `entity_types="people"` to show who appears in what I've read
+7. Use `search_document` if I want to find specific references
+
+This creates an interactive, bookmark-enabled reading experience."""
+        }
+    ]
+
+
+@app.prompt(
+    name="manuscript-review",
+    description="Advanced: Complete manuscript review workflow for editors"
+)
+def prompt_manuscript_review(file_path: str = "") -> list:
+    """Guide for comprehensive manuscript review."""
+    path_hint = f"manuscript at `{file_path}`" if file_path else "the manuscript"
+    return [
+        {
+            "role": "user",
+            "content": f"""Help me conduct a complete editorial review of {path_hint}:
+
+**Phase 1: Structure Assessment**
+1. `get_document_outline` - Map the complete structure
+2. `check_style_consistency` - Identify formatting issues, missing chapters, style problems
+3. Report any structural issues found
+
+**Phase 2: Content Analysis**
+4. `get_chapter_summaries` - Get overview of each chapter's content
+5. `extract_entities` - Extract all characters, locations, organizations
+6. Flag any inconsistencies (characters who appear then disappear, etc.)
+
+**Phase 3: Deep Dive**
+7. For each chapter with issues, use `convert_to_markdown` to review
+8. Use `search_document` to verify specific details if needed
+9. Document findings with chapter numbers and paragraph indices
+
+**Phase 4: Final Report**
+Compile all findings into an editorial report with:
+- Structure issues and recommendations
+- Character/entity tracking
+- Suggested fixes with specific locations
+
+This is a complete editorial workflow for manuscript review."""
+        }
+    ]
+
+
 def main():
    """Entry point for the MCP Office Tools server."""
-    import asyncio
-
-    async def run_server():
-        await app.run_stdio_async()
-
-    asyncio.run(run_server())
+    # CRITICAL: show_banner=False is required for stdio transport!
+    # FastMCP's banner prints ASCII art to stdout which breaks JSON-RPC protocol
+    app.run(show_banner=False)

 if __name__ == "__main__":
    main()
--- a/src/mcp_office_tools/server_legacy.py
+++ b/src/mcp_office_tools/server_legacy.py
--- a/src/mcp_office_tools/server_monolithic.py
+++ b/src/mcp_office_tools/server_monolithic.py
--- a/src/mcp_office_tools/utils/init.py
+++ b/src/mcp_office_tools/utils/init.py
@ -27,6 +27,48 @@ from .decorators import (
    handle_office_errors
 )

+from .processing import (
+    TEMP_DIR,
+    DEBUG,
+    _extract_basic_metadata,
+    _calculate_health_score,
+    _get_health_recommendations,
+    _smart_truncate_content,
+    _parse_page_range,
+    _get_processing_recommendation,
+)
+
+from .word_processing import (
+    _extract_word_text,
+    _extract_word_images,
+    _extract_word_metadata,
+    _convert_docx_to_markdown,
+    _convert_docx_with_python_docx,
+    _convert_doc_to_markdown,
+    _get_ultra_fast_summary,
+    _find_bookmark_content_range,
+    _find_chapter_content_range,
+    _get_available_headings,
+    _has_page_break,
+    _analyze_document_size,
+    _paragraph_to_markdown,
+    _table_to_markdown,
+    _html_to_markdown,
+    _extract_markdown_structure,
+)
+
+from .excel_processing import (
+    _extract_excel_text,
+    _extract_excel_images,
+    _extract_excel_metadata,
+)
+
+from .powerpoint_processing import (
+    _extract_powerpoint_text,
+    _extract_powerpoint_images,
+    _extract_powerpoint_metadata,
+)
+
 __all__ = [
    # Validation
    "OfficeFileError",
--- a/src/mcp_office_tools/utils/excel_processing.py
+++ b/src/mcp_office_tools/utils/excel_processing.py
@ -0,0 +1,203 @@
+"""Excel document processing utilities.
+
+This module provides helper functions for extracting text, images, and metadata
+from Excel documents (.xlsx, .xls, .xlsm, .csv) with intelligent method selection
+and fallback support.
+"""
+
+from typing import Any
+
+from . import OfficeFileError
+
+
+async def _extract_excel_text(file_path: str, extension: str, preserve_formatting: bool, method: str) -> dict[str, Any]:
+    """Extract text from Excel documents."""
+    methods_tried = []
+
+    if extension == ".csv":
+        # CSV handling
+        import pandas as pd
+        try:
+            df = pd.read_csv(file_path)
+            text = df.to_string()
+            return {
+                "text": text,
+                "method_used": "pandas",
+                "methods_tried": ["pandas"],
+                "formatted_sections": [{"type": "table", "data": df.to_dict()}] if preserve_formatting else []
+            }
+        except Exception as e:
+            raise OfficeFileError(f"CSV processing failed: {str(e)}")
+
+    # Excel file handling
+    text = ""
+    formatted_sections = []
+    method_used = None
+
+    method_order = ["openpyxl", "pandas", "xlrd"] if extension == ".xlsx" else ["xlrd", "pandas", "openpyxl"]
+
+    for method_name in method_order:
+        try:
+            methods_tried.append(method_name)
+
+            if method_name == "openpyxl" and extension in [".xlsx", ".xlsm"]:
+                import openpyxl
+                wb = openpyxl.load_workbook(file_path, data_only=True)
+
+                text_parts = []
+                for sheet_name in wb.sheetnames:
+                    ws = wb[sheet_name]
+                    text_parts.append(f"Sheet: {sheet_name}")
+
+                    for row in ws.iter_rows(values_only=True):
+                        row_text = "\t".join(str(cell) if cell is not None else "" for cell in row)
+                        if row_text.strip():
+                            text_parts.append(row_text)
+
+                    if preserve_formatting:
+                        formatted_sections.append({
+                            "type": "worksheet",
+                            "name": sheet_name,
+                            "data": [[str(cell.value) if cell.value is not None else "" for cell in row] for row in ws.iter_rows()]
+                        })
+
+                text = "\n".join(text_parts)
+                method_used = "openpyxl"
+                break
+
+            elif method_name == "pandas":
+                import pandas as pd
+
+                if extension in [".xlsx", ".xlsm"]:
+                    dfs = pd.read_excel(file_path, sheet_name=None)
+                else:  # .xls
+                    dfs = pd.read_excel(file_path, sheet_name=None, engine='xlrd')
+
+                text_parts = []
+                for sheet_name, df in dfs.items():
+                    text_parts.append(f"Sheet: {sheet_name}")
+                    text_parts.append(df.to_string())
+
+                    if preserve_formatting:
+                        formatted_sections.append({
+                            "type": "dataframe",
+                            "name": sheet_name,
+                            "data": df.to_dict()
+                        })
+
+                text = "\n\n".join(text_parts)
+                method_used = "pandas"
+                break
+
+            elif method_name == "xlrd" and extension == ".xls":
+                import xlrd
+                wb = xlrd.open_workbook(file_path)
+
+                text_parts = []
+                for sheet in wb.sheets():
+                    text_parts.append(f"Sheet: {sheet.name}")
+
+                    for row_idx in range(sheet.nrows):
+                        row = sheet.row_values(row_idx)
+                        row_text = "\t".join(str(cell) for cell in row)
+                        text_parts.append(row_text)
+
+                text = "\n".join(text_parts)
+                method_used = "xlrd"
+                break
+
+        except ImportError:
+            continue
+        except Exception:
+            continue
+
+    if not method_used:
+        raise OfficeFileError(f"Failed to extract text using methods: {', '.join(methods_tried)}")
+
+    return {
+        "text": text,
+        "method_used": method_used,
+        "methods_tried": methods_tried,
+        "formatted_sections": formatted_sections
+    }
+
+
+async def _extract_excel_images(file_path: str, extension: str, output_format: str, min_width: int, min_height: int) -> list[dict[str, Any]]:
+    """Extract images from Excel documents."""
+    import io
+    import os
+    import tempfile
+    import zipfile
+
+    from PIL import Image
+
+    images = []
+    TEMP_DIR = os.environ.get("OFFICE_TEMP_DIR", tempfile.gettempdir())
+
+    if extension in [".xlsx", ".xlsm"]:
+        try:
+            with zipfile.ZipFile(file_path, 'r') as zip_file:
+                # Look for images in media folder
+                image_files = [f for f in zip_file.namelist() if f.startswith('xl/media/')]
+
+                for i, img_path in enumerate(image_files):
+                    try:
+                        img_data = zip_file.read(img_path)
+                        img = Image.open(io.BytesIO(img_data))
+
+                        # Size filtering
+                        if img.width >= min_width and img.height >= min_height:
+                            # Save to temp file
+                            temp_path = os.path.join(TEMP_DIR, f"excel_image_{i}.{output_format}")
+                            img.save(temp_path, format=output_format.upper())
+
+                            images.append({
+                                "index": i,
+                                "filename": os.path.basename(img_path),
+                                "path": temp_path,
+                                "width": img.width,
+                                "height": img.height,
+                                "format": img.format,
+                                "size_bytes": len(img_data)
+                            })
+                    except Exception:
+                        continue
+
+        except Exception as e:
+            raise OfficeFileError(f"Excel image extraction failed: {str(e)}")
+
+    return images
+
+
+async def _extract_excel_metadata(file_path: str, extension: str) -> dict[str, Any]:
+    """Extract Excel-specific metadata."""
+    metadata = {"type": "excel", "extension": extension}
+
+    if extension in [".xlsx", ".xlsm"]:
+        try:
+            import openpyxl
+            wb = openpyxl.load_workbook(file_path)
+
+            props = wb.properties
+            metadata.update({
+                "title": props.title,
+                "creator": props.creator,
+                "subject": props.subject,
+                "description": props.description,
+                "keywords": props.keywords,
+                "created": str(props.created) if props.created else None,
+                "modified": str(props.modified) if props.modified else None
+            })
+
+            # Workbook structure
+            metadata.update({
+                "worksheet_count": len(wb.worksheets),
+                "worksheet_names": wb.sheetnames,
+                "has_charts": any(len(ws._charts) > 0 for ws in wb.worksheets),
+                "has_images": any(len(ws._images) > 0 for ws in wb.worksheets)
+            })
+
+        except Exception:
+            pass
+
+    return metadata
--- a/src/mcp_office_tools/utils/powerpoint_processing.py
+++ b/src/mcp_office_tools/utils/powerpoint_processing.py
@ -0,0 +1,177 @@
+"""PowerPoint document processing utilities.
+
+This module provides helper functions for extracting text, images, and metadata
+from PowerPoint documents (.pptx and .ppt files).
+"""
+
+import io
+import os
+import zipfile
+from typing import Any
+
+from PIL import Image
+
+from . import OfficeFileError
+
+
+async def _extract_powerpoint_text(
+    file_path: str, extension: str, preserve_formatting: bool, method: str
+) -> dict[str, Any]:
+    """Extract text from PowerPoint documents."""
+    methods_tried = []
+
+    if extension == ".pptx":
+        try:
+            import pptx
+
+            prs = pptx.Presentation(file_path)
+
+            text_parts = []
+            formatted_sections = []
+
+            for slide_num, slide in enumerate(prs.slides, 1):
+                slide_text_parts = []
+
+                for shape in slide.shapes:
+                    if hasattr(shape, "text") and shape.text:
+                        slide_text_parts.append(shape.text)
+
+                slide_text = "\n".join(slide_text_parts)
+                text_parts.append(f"Slide {slide_num}:\n{slide_text}")
+
+                if preserve_formatting:
+                    formatted_sections.append(
+                        {
+                            "type": "slide",
+                            "number": slide_num,
+                            "text": slide_text,
+                            "shapes": len(slide.shapes),
+                        }
+                    )
+
+            text = "\n\n".join(text_parts)
+
+            return {
+                "text": text,
+                "method_used": "python-pptx",
+                "methods_tried": ["python-pptx"],
+                "formatted_sections": formatted_sections,
+            }
+
+        except ImportError:
+            methods_tried.append("python-pptx")
+        except Exception:
+            methods_tried.append("python-pptx")
+
+    # Legacy .ppt handling would require additional libraries
+    if extension == ".ppt":
+        raise OfficeFileError(
+            "Legacy PowerPoint (.ppt) text extraction requires additional setup"
+        )
+
+    raise OfficeFileError(
+        f"Failed to extract text using methods: {', '.join(methods_tried)}"
+    )
+
+
+async def _extract_powerpoint_images(
+    file_path: str,
+    extension: str,
+    output_format: str,
+    min_width: int,
+    min_height: int,
+    temp_dir: str,
+) -> list[dict[str, Any]]:
+    """Extract images from PowerPoint documents."""
+    images = []
+
+    if extension == ".pptx":
+        try:
+            with zipfile.ZipFile(file_path, "r") as zip_file:
+                # Look for images in media folder
+                image_files = [
+                    f for f in zip_file.namelist() if f.startswith("ppt/media/")
+                ]
+
+                for i, img_path in enumerate(image_files):
+                    try:
+                        img_data = zip_file.read(img_path)
+                        img = Image.open(io.BytesIO(img_data))
+
+                        # Size filtering
+                        if img.width >= min_width and img.height >= min_height:
+                            # Save to temp file
+                            temp_path = os.path.join(
+                                temp_dir, f"powerpoint_image_{i}.{output_format}"
+                            )
+                            img.save(temp_path, format=output_format.upper())
+
+                            images.append(
+                                {
+                                    "index": i,
+                                    "filename": os.path.basename(img_path),
+                                    "path": temp_path,
+                                    "width": img.width,
+                                    "height": img.height,
+                                    "format": img.format,
+                                    "size_bytes": len(img_data),
+                                }
+                            )
+                    except Exception:
+                        continue
+
+        except Exception as e:
+            raise OfficeFileError(f"PowerPoint image extraction failed: {str(e)}")
+
+    return images
+
+
+async def _extract_powerpoint_metadata(
+    file_path: str, extension: str
+) -> dict[str, Any]:
+    """Extract PowerPoint-specific metadata."""
+    metadata = {"type": "powerpoint", "extension": extension}
+
+    if extension == ".pptx":
+        try:
+            import pptx
+
+            prs = pptx.Presentation(file_path)
+
+            core_props = prs.core_properties
+            metadata.update(
+                {
+                    "title": core_props.title,
+                    "author": core_props.author,
+                    "subject": core_props.subject,
+                    "keywords": core_props.keywords,
+                    "comments": core_props.comments,
+                    "created": str(core_props.created) if core_props.created else None,
+                    "modified": str(core_props.modified)
+                    if core_props.modified
+                    else None,
+                }
+            )
+
+            # Presentation structure
+            slide_layouts = set()
+            total_shapes = 0
+
+            for slide in prs.slides:
+                slide_layouts.add(slide.slide_layout.name)
+                total_shapes += len(slide.shapes)
+
+            metadata.update(
+                {
+                    "slide_count": len(prs.slides),
+                    "slide_layouts": list(slide_layouts),
+                    "total_shapes": total_shapes,
+                    "slide_width": prs.slide_width,
+                    "slide_height": prs.slide_height,
+                }
+            )
+
+        except Exception:
+            pass
+
+    return metadata
--- a/src/mcp_office_tools/utils/processing.py
+++ b/src/mcp_office_tools/utils/processing.py
@ -0,0 +1,228 @@
+"""Universal processing helper functions for Office documents.
+
+This module contains helper functions used across different document processing
+operations including metadata extraction, health scoring, content truncation,
+and page range parsing.
+"""
+
+import os
+import tempfile
+from typing import Any
+
+# Configuration
+TEMP_DIR = os.environ.get("OFFICE_TEMP_DIR", tempfile.gettempdir())
+DEBUG = os.environ.get("DEBUG", "false").lower() == "true"
+
+
+async def _extract_basic_metadata(file_path: str, extension: str, category: str) -> dict[str, Any]:
+    """Extract basic metadata from Office documents."""
+    metadata = {"category": category, "extension": extension}
+
+    try:
+        if extension in [".docx", ".xlsx", ".pptx"] and category in ["word", "excel", "powerpoint"]:
+            import zipfile
+
+            with zipfile.ZipFile(file_path, 'r') as zip_file:
+                # Core properties
+                if 'docProps/core.xml' in zip_file.namelist():
+                    zip_file.read('docProps/core.xml').decode('utf-8')
+                    metadata["has_core_properties"] = True
+
+                # App properties
+                if 'docProps/app.xml' in zip_file.namelist():
+                    zip_file.read('docProps/app.xml').decode('utf-8')
+                    metadata["has_app_properties"] = True
+
+    except Exception:
+        pass
+
+    return metadata
+
+
+def _calculate_health_score(validation: dict[str, Any], format_info: dict[str, Any]) -> int:
+    """Calculate document health score (1-10)."""
+    score = 10
+
+    # Deduct for validation errors
+    if not validation["is_valid"]:
+        score -= 5
+
+    if validation["errors"]:
+        score -= len(validation["errors"]) * 2
+
+    if validation["warnings"]:
+        score -= len(validation["warnings"])
+
+    # Deduct for problematic characteristics
+    if validation.get("password_protected"):
+        score -= 1
+
+    if format_info.get("is_legacy"):
+        score -= 1
+
+    structure = format_info.get("structure", {})
+    if structure.get("estimated_complexity") == "complex":
+        score -= 1
+
+    return max(1, min(10, score))
+
+
+def _get_health_recommendations(validation: dict[str, Any], format_info: dict[str, Any]) -> list[str]:
+    """Get health improvement recommendations."""
+    recommendations = []
+
+    if validation["errors"]:
+        recommendations.append("Fix validation errors before processing")
+
+    if validation.get("password_protected"):
+        recommendations.append("Remove password protection if possible")
+
+    if format_info.get("is_legacy"):
+        recommendations.append("Consider converting to modern format (.docx, .xlsx, .pptx)")
+
+    structure = format_info.get("structure", {})
+    if structure.get("estimated_complexity") == "complex":
+        recommendations.append("Complex document may require specialized processing")
+
+    if not recommendations:
+        recommendations.append("Document appears healthy and ready for processing")
+
+    return recommendations
+
+
+def _smart_truncate_content(content: str, max_chars: int) -> str:
+    """Intelligently truncate content while preserving structure and readability."""
+    if len(content) <= max_chars:
+        return content
+
+    lines = content.split('\n')
+    truncated_lines = []
+    current_length = 0
+
+    # Try to preserve structure by stopping at a natural break point
+    for line in lines:
+        line_length = len(line) + 1  # +1 for newline
+
+        # If adding this line would exceed limit
+        if current_length + line_length > max_chars:
+            # Try to find a good stopping point
+            if truncated_lines:
+                # Check if we're in the middle of a section
+                last_lines = '\n'.join(truncated_lines[-3:]) if len(truncated_lines) >= 3 else '\n'.join(truncated_lines)
+
+                # If we stopped mid-paragraph, remove incomplete paragraph
+                if not (line.strip() == '' or line.startswith('#') or line.startswith('|')):
+                    # Remove lines until we hit a natural break
+                    while truncated_lines and not (
+                        truncated_lines[-1].strip() == '' or
+                        truncated_lines[-1].startswith('#') or
+                        truncated_lines[-1].startswith('|') or
+                        truncated_lines[-1].startswith('-') or
+                        truncated_lines[-1].startswith('*')
+                    ):
+                        truncated_lines.pop()
+            break
+
+        truncated_lines.append(line)
+        current_length += line_length
+
+    # Add truncation notice
+    result = '\n'.join(truncated_lines)
+    result += f"\n\n---\n**[CONTENT TRUNCATED]**\nShowing {len(result):,} of {len(content):,} characters.\nUse smaller page ranges (e.g., 3-5 pages) for full content without truncation.\n---"
+
+    return result
+
+
+def _parse_page_range(page_range: str) -> list[int]:
+    """Parse page range string into list of page numbers.
+
+    Examples:
+        "1-5" -> [1, 2, 3, 4, 5]
+        "1,3,5" -> [1, 3, 5]
+        "1-3,5,7-9" -> [1, 2, 3, 5, 7, 8, 9]
+    """
+    pages = set()
+
+    for part in page_range.split(','):
+        part = part.strip()
+        if '-' in part:
+            # Handle range like "1-5"
+            start, end = part.split('-', 1)
+            try:
+                start_num = int(start.strip())
+                end_num = int(end.strip())
+                pages.update(range(start_num, end_num + 1))
+            except ValueError:
+                continue
+        else:
+            # Handle single page like "3"
+            try:
+                pages.add(int(part))
+            except ValueError:
+                continue
+
+    return sorted(list(pages))
+
+
+def _get_processing_recommendation(
+    doc_analysis: dict[str, Any],
+    page_range: str,
+    summary_only: bool
+) -> dict[str, Any]:
+    """Generate intelligent processing recommendations based on document analysis."""
+
+    estimated_pages = doc_analysis["estimated_pages"]
+    content_size = doc_analysis["estimated_content_size"]
+
+    recommendation = {
+        "status": "optimal",
+        "message": "",
+        "suggested_workflow": [],
+        "warnings": []
+    }
+
+    # Large document recommendations
+    if content_size in ["large", "very_large"] and not page_range and not summary_only:
+        recommendation["status"] = "suboptimal"
+        recommendation["message"] = (
+            f"⚠️  Large document detected ({estimated_pages} estimated pages). "
+            "Consider using recommended workflow for better performance."
+        )
+        recommendation["suggested_workflow"] = [
+            "1. First: Call with summary_only=true to get document overview and TOC",
+            "2. Then: Use page_range to process specific sections (e.g., '1-5', '6-10', '15-20')",
+            "3. Recommended: Use 3-8 page chunks to stay under 25k token MCP limit",
+            "4. The tool auto-truncates if content is too large, but smaller ranges work better"
+        ]
+        recommendation["warnings"] = [
+            "Page ranges >8 pages may hit 25k token response limit and get truncated",
+            "Use smaller page ranges (3-5 pages) for dense content documents",
+            "Auto-truncation preserves structure but loses content completeness"
+        ]
+
+    # Medium document recommendations
+    elif content_size == "medium" and not page_range and not summary_only:
+        recommendation["status"] = "caution"
+        recommendation["message"] = (
+            f"Medium document detected ({estimated_pages} estimated pages). "
+            "Consider summary_only=true first if you encounter response size issues."
+        )
+        recommendation["suggested_workflow"] = [
+            "Option 1: Try full processing (current approach)",
+            "Option 2: Use summary_only=true first, then page_range if needed"
+        ]
+
+    # Optimal usage patterns
+    elif summary_only:
+        recommendation["message"] = "✅ Excellent! Using summary mode for initial document analysis."
+        recommendation["suggested_workflow"] = [
+            "After reviewing summary, use page_range to extract specific sections of interest"
+        ]
+
+    elif page_range and content_size in ["large", "very_large"]:
+        recommendation["message"] = "✅ Perfect! Using page-range processing for efficient extraction."
+
+    elif content_size == "small":
+        recommendation["message"] = "✅ Small document - full processing is optimal."
+
+    return recommendation
--- a/src/mcp_office_tools/utils/word_processing.py
+++ b/src/mcp_office_tools/utils/word_processing.py
--- a/tests/test_mixins.py
+++ b/tests/test_mixins.py
@ -64,7 +64,7 @@ class TestMixinArchitecture:
        word = WordMixin()
        word.register_all(app)
        word_tools = len(app._tool_manager._tools) - initial_tool_count - universal_tools
-        assert word_tools == 3  # convert_to_markdown, extract_word_tables, analyze_word_structure
+        assert word_tools == 10  # convert_to_markdown, extract_word_tables, analyze_word_structure, get_document_outline, check_style_consistency, search_document, extract_entities, get_chapter_summaries, save_reading_progress, get_reading_progress

        excel = ExcelMixin()
        excel.register_all(app)
--- a/tests/test_server.py
+++ b/tests/test_server.py
@ -149,8 +149,8 @@ class TestMixinIntegration:
        # Verify no duplicates
        assert len(tool_names) == len(set(tool_names)), "Tool names should be unique"

-        # Verify expected count: 6 universal + 3 word + 3 excel = 12
-        assert len(tool_names) == 12, f"Expected 12 tools, got {len(tool_names)}: {list(tool_names.keys())}"
+        # Verify expected count: 6 universal + 10 word + 3 excel = 19
+        assert len(tool_names) == 19, f"Expected 19 tools, got {len(tool_names)}: {list(tool_names.keys())}"


 if __name__ == "__main__":
--- a/tests/test_word_mixin.py
+++ b/tests/test_word_mixin.py
@ -28,14 +28,14 @@ class TestWordMixinRegistration:
        mixin.register_all(app)

        assert mixin is not None
-        assert len(app._tool_manager._tools) == 3  # convert_to_markdown, extract_word_tables, analyze_word_structure
+        assert len(app._tool_manager._tools) == 10  # convert_to_markdown, extract_word_tables, analyze_word_structure, get_document_outline, check_style_consistency, search_document, extract_entities, get_chapter_summaries, save_reading_progress, get_reading_progress

    def test_tool_names_registered(self):
        """Test that Word-specific tools are registered."""
        app = FastMCP("Test Word")
        WordMixin().register_all(app)

-        expected_tools = {"convert_to_markdown", "extract_word_tables", "analyze_word_structure"}
+        expected_tools = {"convert_to_markdown", "extract_word_tables", "analyze_word_structure", "get_document_outline", "check_style_consistency", "search_document", "extract_entities", "get_chapter_summaries", "save_reading_progress", "get_reading_progress"}
        registered_tools = set(app._tool_manager._tools.keys())
        assert expected_tools.issubset(registered_tools)

@ -409,5 +409,85 @@ class TestLegacyWordSupport:
                    assert "conversion_method" in result["metadata"]


+class TestPageRangeFiltering:
+    """Test page_range content filtering for convert_to_markdown.
+
+    These tests verify that the page_range parameter correctly filters
+    content based on either explicit page breaks or estimated paragraph counts.
+    """
+
+    @pytest.fixture
+    def mixin(self):
+        """Create WordMixin for testing."""
+        app = FastMCP("Test")
+        mixin = WordMixin()
+        mixin.register_all(app)
+        return mixin
+
+    @pytest.mark.asyncio
+    @patch('mcp_office_tools.mixins.word.resolve_office_file_path')
+    @patch('mcp_office_tools.mixins.word.validate_office_file')
+    @patch('mcp_office_tools.mixins.word.detect_format')
+    async def test_page_range_filters_different_content(self, mock_detect, mock_validate, mock_resolve, mixin):
+        """Test that different page_range values return different content.
+
+        This is the key regression test for the page_range bug where
+        include_current_page was set but never used to filter content.
+        """
+        mock_resolve.return_value = "/test.docx"
+        mock_validate.return_value = {"is_valid": True, "errors": []}
+        mock_detect.return_value = {"category": "word", "extension": ".docx", "format_name": "Word Document"}
+
+        with patch.object(mixin, '_analyze_document_size') as mock_analyze:
+            with patch.object(mixin, '_get_processing_recommendation') as mock_recommend:
+                mock_analyze.return_value = {"estimated_pages": 10}
+                mock_recommend.return_value = {"status": "optimal", "message": "", "suggested_workflow": [], "warnings": []}
+
+                # Create mock conversions that return different content per page
+                call_count = [0]
+                def mock_convert_side_effect(*args, **kwargs):
+                    call_count[0] += 1
+                    page_numbers = args[5] if len(args) > 5 else kwargs.get('page_numbers')
+                    if page_numbers == [1, 2]:
+                        return {
+                            "content": "# Page 1-2 Content\n\nThis is from pages 1 and 2.",
+                            "method_used": "python-docx-custom",
+                            "images": [],
+                            "structure": {"headings": [], "tables": 0, "lists": 0, "paragraphs": 5}
+                        }
+                    elif page_numbers == [10, 11]:
+                        return {
+                            "content": "# Page 10-11 Content\n\nThis is from pages 10 and 11.",
+                            "method_used": "python-docx-custom",
+                            "images": [],
+                            "structure": {"headings": [], "tables": 0, "lists": 0, "paragraphs": 5}
+                        }
+                    else:
+                        return {
+                            "content": "# Full Content",
+                            "method_used": "python-docx-custom",
+                            "images": [],
+                            "structure": {"headings": [], "tables": 0, "lists": 0, "paragraphs": 20}
+                        }
+
+                with patch.object(mixin, '_convert_docx_to_markdown', side_effect=mock_convert_side_effect):
+                    # Test page_range 1-2
+                    result_1_2 = await mixin.convert_to_markdown(
+                        file_path="/test.docx",
+                        page_range="1-2"
+                    )
+
+                    # Test page_range 10-11
+                    result_10_11 = await mixin.convert_to_markdown(
+                        file_path="/test.docx",
+                        page_range="10-11"
+                    )
+
+                    # The content should be different for different page ranges
+                    assert "Page 1-2" in result_1_2["markdown"]
+                    assert "Page 10-11" in result_10_11["markdown"]
+                    assert result_1_2["markdown"] != result_10_11["markdown"]
+
+
 if __name__ == "__main__":
    pytest.main([__file__, "-v"])
Author	SHA1	Message	Date
Ryan Malloy	11defb4eae	Update README and gitignore for new document tools Some checks are pending Test Dashboard / test-and-dashboard (push) Waiting to run Details - Add 7 new Word tools to README (outline, search, entities, etc.) - Add 9 MCP prompts section with workflow descriptions - Gitignore reading progress bookmark files (.*.reading_progress.json) - Gitignore local .mcp.json and test documents	2026-01-11 07:41:49 -07:00
Ryan Malloy	4b38f6455c	Add document navigation tools and MCP prompts New tools for Word document analysis: - extract_entities: Pattern-based extraction of people, places, organizations - get_chapter_summaries: Chapter previews with opening sentences and word counts - save_reading_progress: Bookmark reading position to JSON file - get_reading_progress: Resume reading from saved position New MCP prompts (basic to advanced workflows): - explore-document: Get started with a new document - find-character: Track character mentions - chapter-preview: Quick chapter overviews - resume-reading: Continue where you left off - document-analysis: Comprehensive multi-tool analysis - character-journey: Track character arc through narrative - document-comparison: Compare entities between chapters - full-reading-session: Guided reading with bookmarking - manuscript-review: Complete editorial workflow Updated test counts for 19 total tools (6 universal + 10 word + 3 excel)	2026-01-11 07:23:15 -07:00
Ryan Malloy	1abce7f26d	Add document navigation tools: outline, style check, search New tools for easier document navigation: - get_document_outline: Structured view of headings with chapter detection - check_style_consistency: Find formatting issues and missing chapters - search_document: Search with context and chapter location All tools tested with 200+ page manuscript. Detects issues like Chapter 3 being styled as "normal" instead of "Heading 1".	2026-01-11 07:15:43 -07:00
Ryan Malloy	34e636e782	Add documentation for DOCX processing fixes Documents 6 critical bugs discovered while processing a 200+ page manuscript, including the root cause xpath API mismatch between python-docx and lxml that caused silent failures in chapter search.	2026-01-11 06:47:39 -07:00
Ryan Malloy	2f39c4ec5b	Fix critical xpath API bug breaking chapter/heading detection python-docx elements don't support xpath() with namespaces kwarg. The calls silently failed in try/except blocks, causing chapter search and heading detection to never find matches. Fixed by replacing xpath(..., namespaces={...}) with: - findall('.//' + qn('w:t')) for text elements - find(qn('w:pPr')) + find(qn('w:pStyle')) for style detection - get(qn('w:val')) for attribute values Also fixed logic bug where elif prevented short-text fallback from running when a non-heading style existed on the paragraph.	2026-01-11 05:20:05 -07:00
Ryan Malloy	af6aadf559	Refactor: Extract processing logic into utility modules Complete architecture cleanup - eliminated duplicate server files: - Deleted server_monolithic.py (2249 lines) - Deleted server_legacy.py (2209 lines) New utility modules created: - utils/word_processing.py - Word extraction/conversion (preserves page range fixes) - utils/excel_processing.py - Excel extraction - utils/powerpoint_processing.py - PowerPoint extraction - utils/processing.py - Universal helpers (parse_page_range, health checks, etc.) Updated mixins to import from utils instead of server_monolithic. Entry point remains server.py (48 lines) using mixin architecture. All 53 tests pass. Coverage improved from 11% to 22% by removing duplicate code.	2026-01-11 05:08:18 -07:00
Ryan Malloy	8249afb763	Fix banner issue in server.py entry point The pyproject.toml script entry point (mcp-office-tools) uses server.py, not server_monolithic.py. Applied same show_banner=False fix and simplified to use app.run() instead of asyncio.run(app.run_stdio_async()).	2026-01-11 04:32:46 -07:00
Ryan Malloy	210aa99e0b	Fix page range extraction for large documents and MCP connection Bug fixes: - Remove 100-paragraph cap that prevented extracting content past ~page 4 Now calculates limit based on number of pages requested (300 paras/page) - Add fallback page estimation when docs lack explicit page breaks Uses ~25 paragraphs per page for navigation in non-paginated docs - Fix _get_available_headings to scan full document (was only first 100 elements) Headings like Chapter 10 at element 1524 were invisible - Fix MCP connection by disabling FastMCP banner (show_banner=False) ASCII art banner was corrupting stdout JSON-RPC protocol Changes: - Default image_mode changed from 'base64' to 'files' to avoid huge responses - Add proper .mcp.json config with command/args format - Add test document to .gitignore for privacy	2026-01-11 04:27:56 -07:00