Implement comprehensive PDF processing suite with 15 additional advanced tools

Major expansion from 8 to 23 total tools covering: **Document Analysis & Intelligence:** - analyze_pdf_health: Comprehensive quality and health analysis - analyze_pdf_security: Security features and vulnerability assessment - classify_content: AI-powered document type classification - summarize_content: Intelligent content summarization with key insights - compare_pdfs: Advanced document comparison (text, structure, metadata) **Layout & Visual Analysis:** - analyze_layout: Page layout analysis with column detection - extract_charts: Chart, diagram, and visual element extraction - detect_watermarks: Watermark detection and analysis **Content Manipulation:** - extract_form_data: Interactive PDF form data extraction - split_pdf: Split PDFs at specified pages - merge_pdfs: Merge multiple PDFs into one - rotate_pages: Rotate pages by 90°/180°/270° **Optimization & Utilities:** - convert_to_images: Convert PDF pages to image files - optimize_pdf: File size optimization with quality levels - repair_pdf: Corrupted PDF repair and recovery **Technical Enhancements:** - All tools support HTTPS URLs with intelligent caching - Fixed MCP parameter validation for pages parameter - Comprehensive error handling and validation - Updated documentation with usage examples 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com>
2025-08-11 04:27:04 -06:00 · 2025-08-11 04:27:04 -06:00 · f0365a0d75
commit f0365a0d75
parent 58d43851b9
2 changed files with 2216 additions and 13 deletions
--- a/README.md
+++ b/README.md
@ -214,18 +214,150 @@ result = await extract_images(
 )
 ```
 ### Advanced Analysis
 ```python
 # Analyze document health and quality
 result = await analyze_pdf_health(
    pdf_path="/path/to/document.pdf"
 )
 # Classify content type and structure
 result = await classify_content(
    pdf_path="/path/to/document.pdf"
 )
 # Generate content summary
 result = await summarize_content(
    pdf_path="/path/to/document.pdf",
    summary_length="medium",  # "short", "medium", "long"
    pages="1,2,3"  # Specific pages
 )
 # Analyze page layout
 result = await analyze_layout(
    pdf_path="/path/to/document.pdf",
    pages="1,2,3",
    include_coordinates=True
 )
 ```
 ### Content Manipulation
 ```python
 # Extract form data
 result = await extract_form_data(
    pdf_path="/path/to/form.pdf"
 )
 # Split PDF into separate files
 result = await split_pdf(
    pdf_path="/path/to/document.pdf",
    split_pages="5,10,15",  # Split after pages 5, 10, 15
    output_prefix="section"
 )
 # Merge multiple PDFs
 result = await merge_pdfs(
    pdf_paths=["/path/to/doc1.pdf", "/path/to/doc2.pdf"],
    output_filename="merged_document.pdf"
 )
 # Rotate specific pages
 result = await rotate_pages(
    pdf_path="/path/to/document.pdf",
    page_rotations={"1": 90, "3": 180}  # Page 1: 90°, Page 3: 180°
 )
 ```
 ### Optimization and Repair
 ```python
 # Optimize PDF file size
 result = await optimize_pdf(
    pdf_path="/path/to/large.pdf",
    optimization_level="balanced",  # "light", "balanced", "aggressive"
    preserve_quality=True
 )
 # Repair corrupted PDF
 result = await repair_pdf(
    pdf_path="/path/to/corrupted.pdf"
 )
 # Compare two PDFs
 result = await compare_pdfs(
    pdf_path1="/path/to/original.pdf",
    pdf_path2="/path/to/modified.pdf",
    comparison_type="all"  # "text", "structure", "metadata", "all"
 )
 ```
 ### Visual Analysis
 ```python
 # Extract charts and diagrams
 result = await extract_charts(
    pdf_path="/path/to/report.pdf",
    pages="2,3,4",
    min_size=150  # Minimum size for chart detection
 )
 # Detect watermarks
 result = await detect_watermarks(
    pdf_path="/path/to/document.pdf"
 )
 # Security analysis
 result = await analyze_pdf_security(
    pdf_path="/path/to/document.pdf"
 )
 ```
 ## Available Tools
 ### Core Processing Tools
 | Tool | Description |
 |------|-------------|
 | `extract_text` | Extract text with multiple methods and layout preservation |
 | `extract_tables` | Extract tables in various formats (JSON, CSV, Markdown) |
 | `ocr_pdf` | Perform OCR on scanned PDFs with preprocessing |
 | `extract_images` | Extract images with filtering options |
 | `pdf_to_markdown` | Convert PDF to clean Markdown format |
 ### Document Analysis Tools  
 | Tool | Description |
 |------|-------------|
 | `is_scanned_pdf` | Check if a PDF is scanned or text-based |
 | `get_document_structure` | Extract document structure, outline, and basic metadata |
 | `extract_metadata` | Extract comprehensive metadata and file statistics |
-| `pdf_to_markdown` | Convert PDF to clean Markdown format |
+| `analyze_pdf_health` | Comprehensive PDF health and quality analysis |
-| `extract_images` | Extract images with filtering options |
+| `analyze_pdf_security` | Analyze PDF security features and potential issues |
 | `classify_content` | Classify and analyze PDF content type and structure |
 | `summarize_content` | Generate summary and key insights from PDF content |
 ### Layout and Visual Analysis Tools
 | Tool | Description |
 |------|-------------|
 | `analyze_layout` | Analyze PDF page layout including text blocks, columns, and spacing |
 | `extract_charts` | Extract and analyze charts, diagrams, and visual elements |
 | `detect_watermarks` | Detect and analyze watermarks in PDF |
 ### Content Manipulation Tools
 | Tool | Description |
 |------|-------------|
 | `extract_form_data` | Extract form fields and their values from PDF forms |
 | `split_pdf` | Split PDF into multiple files at specified pages |
 | `merge_pdfs` | Merge multiple PDFs into a single file |
 | `rotate_pages` | Rotate specific pages by 90, 180, or 270 degrees |
 ### Utility and Optimization Tools
 | Tool | Description |
 |------|-------------|
 | `compare_pdfs` | Compare two PDFs for differences in text, structure, and metadata |
 | `convert_to_images` | Convert PDF pages to image files |
 | `optimize_pdf` | Optimize PDF file size and performance |
 | `repair_pdf` | Attempt to repair corrupted or damaged PDF files |
 ## Development
--- a/src/mcp_pdf_tools/server.py
+++ b/src/mcp_pdf_tools/server.py