Implement comprehensive PDF processing suite with 15 additional advanced tools

Major expansion from 8 to 23 total tools covering: **Document Analysis & Intelligence:** - analyze_pdf_health: Comprehensive quality and health analysis - analyze_pdf_security: Security features and vulnerability assessment - classify_content: AI-powered document type classification - summarize_content: Intelligent content summarization with key insights - compare_pdfs: Advanced document comparison (text, structure, metadata) **Layout & Visual Analysis:** - analyze_layout: Page layout analysis with column detection - extract_charts: Chart, diagram, and visual element extraction - detect_watermarks: Watermark detection and analysis **Content Manipulation:** - extract_form_data: Interactive PDF form data extraction - split_pdf: Split PDFs at specified pages - merge_pdfs: Merge multiple PDFs into one - rotate_pages: Rotate pages by 90°/180°/270° **Optimization & Utilities:** - convert_to_images: Convert PDF pages to image files - optimize_pdf: File size optimization with quality levels - repair_pdf: Corrupted PDF repair and recovery **Technical Enhancements:** - All tools support HTTPS URLs with intelligent caching - Fixed MCP parameter validation for pages parameter - Comprehensive error handling and validation - Updated documentation with usage examples 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com>
2025-08-11 04:27:04 -06:00 · 2025-08-11 04:27:04 -06:00 · f0365a0d75
commit f0365a0d75
parent 58d43851b9
2 changed files with 2216 additions and 13 deletions
--- a/README.md
+++ b/README.md
@ -214,18 +214,150 @@ result = await extract_images(
 )
 ```

+### Advanced Analysis
+
+```python
+# Analyze document health and quality
+result = await analyze_pdf_health(
+    pdf_path="/path/to/document.pdf"
+)
+
+# Classify content type and structure
+result = await classify_content(
+    pdf_path="/path/to/document.pdf"
+)
+
+# Generate content summary
+result = await summarize_content(
+    pdf_path="/path/to/document.pdf",
+    summary_length="medium",  # "short", "medium", "long"
+    pages="1,2,3"  # Specific pages
+)
+
+# Analyze page layout
+result = await analyze_layout(
+    pdf_path="/path/to/document.pdf",
+    pages="1,2,3",
+    include_coordinates=True
+)
+```
+
+### Content Manipulation
+
+```python
+# Extract form data
+result = await extract_form_data(
+    pdf_path="/path/to/form.pdf"
+)
+
+# Split PDF into separate files
+result = await split_pdf(
+    pdf_path="/path/to/document.pdf",
+    split_pages="5,10,15",  # Split after pages 5, 10, 15
+    output_prefix="section"
+)
+
+# Merge multiple PDFs
+result = await merge_pdfs(
+    pdf_paths=["/path/to/doc1.pdf", "/path/to/doc2.pdf"],
+    output_filename="merged_document.pdf"
+)
+
+# Rotate specific pages
+result = await rotate_pages(
+    pdf_path="/path/to/document.pdf",
+    page_rotations={"1": 90, "3": 180}  # Page 1: 90°, Page 3: 180°
+)
+```
+
+### Optimization and Repair
+
+```python
+# Optimize PDF file size
+result = await optimize_pdf(
+    pdf_path="/path/to/large.pdf",
+    optimization_level="balanced",  # "light", "balanced", "aggressive"
+    preserve_quality=True
+)
+
+# Repair corrupted PDF
+result = await repair_pdf(
+    pdf_path="/path/to/corrupted.pdf"
+)
+
+# Compare two PDFs
+result = await compare_pdfs(
+    pdf_path1="/path/to/original.pdf",
+    pdf_path2="/path/to/modified.pdf",
+    comparison_type="all"  # "text", "structure", "metadata", "all"
+)
+```
+
+### Visual Analysis
+
+```python
+# Extract charts and diagrams
+result = await extract_charts(
+    pdf_path="/path/to/report.pdf",
+    pages="2,3,4",
+    min_size=150  # Minimum size for chart detection
+)
+
+# Detect watermarks
+result = await detect_watermarks(
+    pdf_path="/path/to/document.pdf"
+)
+
+# Security analysis
+result = await analyze_pdf_security(
+    pdf_path="/path/to/document.pdf"
+)
+```
+
 ## Available Tools

+### Core Processing Tools
 | Tool | Description |
 |------|-------------|
 | `extract_text` | Extract text with multiple methods and layout preservation |
 | `extract_tables` | Extract tables in various formats (JSON, CSV, Markdown) |
 | `ocr_pdf` | Perform OCR on scanned PDFs with preprocessing |
+| `extract_images` | Extract images with filtering options |
+| `pdf_to_markdown` | Convert PDF to clean Markdown format |
+
+### Document Analysis Tools  
+| Tool | Description |
+|------|-------------|
 | `is_scanned_pdf` | Check if a PDF is scanned or text-based |
 | `get_document_structure` | Extract document structure, outline, and basic metadata |
 | `extract_metadata` | Extract comprehensive metadata and file statistics |
-| `pdf_to_markdown` | Convert PDF to clean Markdown format |
-| `extract_images` | Extract images with filtering options |
+| `analyze_pdf_health` | Comprehensive PDF health and quality analysis |
+| `analyze_pdf_security` | Analyze PDF security features and potential issues |
+| `classify_content` | Classify and analyze PDF content type and structure |
+| `summarize_content` | Generate summary and key insights from PDF content |
+
+### Layout and Visual Analysis Tools
+| Tool | Description |
+|------|-------------|
+| `analyze_layout` | Analyze PDF page layout including text blocks, columns, and spacing |
+| `extract_charts` | Extract and analyze charts, diagrams, and visual elements |
+| `detect_watermarks` | Detect and analyze watermarks in PDF |
+
+### Content Manipulation Tools
+| Tool | Description |
+|------|-------------|
+| `extract_form_data` | Extract form fields and their values from PDF forms |
+| `split_pdf` | Split PDF into multiple files at specified pages |
+| `merge_pdfs` | Merge multiple PDFs into a single file |
+| `rotate_pages` | Rotate specific pages by 90, 180, or 270 degrees |
+
+### Utility and Optimization Tools
+| Tool | Description |
+|------|-------------|
+| `compare_pdfs` | Compare two PDFs for differences in text, structure, and metadata |
+| `convert_to_images` | Convert PDF pages to image files |
+| `optimize_pdf` | Optimize PDF file size and performance |
+| `repair_pdf` | Attempt to repair corrupted or damaged PDF files |

 ## Development

--- a/src/mcp_pdf_tools/server.py
+++ b/src/mcp_pdf_tools/server.py