mcp-pdf-tools

Author	SHA1	Message	Date
Ryan Malloy	f601d44d99	Fix page numbering: Switch to user-friendly 1-based indexing Problem: Zero-based page numbers were confusing for users who naturally think of pages starting from 1. Solution: - Updated `parse_pages_parameter()` to convert 1-based user input to 0-based internal representation - All user-facing documentation now uses 1-based page numbering (page 1 = first page) - Internal processing continues to use 0-based indexing for PyMuPDF compatibility - Output page numbers are consistently displayed as 1-based for users Changes: - Enhanced documentation strings to clarify "1-based" page numbering - Updated README examples with 1-based page numbers and clarifying comments - Fixed split_pdf function to handle 1-based input correctly - Updated test cases to verify 1-based -> 0-based conversion - Added feature highlight: "User-Friendly: All page numbers use 1-based indexing" Impact: Much more intuitive for users - no more confusion about which page is "page 0"\! 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-08-11 04:32:20 -06:00
Ryan Malloy	f0365a0d75	Implement comprehensive PDF processing suite with 15 additional advanced tools Major expansion from 8 to 23 total tools covering: Document Analysis & Intelligence: - analyze_pdf_health: Comprehensive quality and health analysis - analyze_pdf_security: Security features and vulnerability assessment - classify_content: AI-powered document type classification - summarize_content: Intelligent content summarization with key insights - compare_pdfs: Advanced document comparison (text, structure, metadata) Layout & Visual Analysis: - analyze_layout: Page layout analysis with column detection - extract_charts: Chart, diagram, and visual element extraction - detect_watermarks: Watermark detection and analysis Content Manipulation: - extract_form_data: Interactive PDF form data extraction - split_pdf: Split PDFs at specified pages - merge_pdfs: Merge multiple PDFs into one - rotate_pages: Rotate pages by 90°/180°/270° Optimization & Utilities: - convert_to_images: Convert PDF pages to image files - optimize_pdf: File size optimization with quality levels - repair_pdf: Corrupted PDF repair and recovery Technical Enhancements: - All tools support HTTPS URLs with intelligent caching - Fixed MCP parameter validation for pages parameter - Comprehensive error handling and validation - Updated documentation with usage examples 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-08-11 04:27:04 -06:00
Ryan Malloy	58d43851b9	Add HTTPS URL support and fix MCP parameter validation Features: - HTTPS URL support: Process PDFs directly from URLs with intelligent caching - Smart caching: 1-hour cache to avoid repeated downloads - Content validation: Verify downloads are actually PDF files - Security: Proper User-Agent headers, HTTPS preferred over HTTP - MCP parameter fixes: Handle pages parameter as string "[2,3]" format - Backward compatibility: Still supports local file paths and list parameters Technical changes: - Added download_pdf_from_url() with caching and validation - Updated validate_pdf_path() to handle URLs and local paths - Added parse_pages_parameter() for flexible parameter parsing - Updated all 8 tools to accept string pages parameters - Enhanced error handling for network and validation issues All tools now support: - Local paths: "/path/to/file.pdf" - HTTPS URLs: "https://example.com/document.pdf" - Flexible pages: "[2,3]", "1,2,3", or [1,2,3] 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-08-11 02:25:53 -06:00
Ryan Malloy	c902e81e4d	Initial commit: Complete MCP PDF Tools server implementation Features: - 8 comprehensive PDF processing tools with intelligent fallbacks - Text extraction (PyMuPDF, pdfplumber, pypdf with auto-selection) - Table extraction (Camelot → pdfplumber → Tabula fallback chain) - OCR processing with Tesseract and preprocessing options - Document analysis (structure, metadata, scanned detection) - Image extraction with filtering capabilities - PDF to markdown conversion with metadata - Built on FastMCP framework with full MCP protocol support - Comprehensive error handling and user-friendly messages - Docker support and cross-platform compatibility - Complete test suite and examples 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-08-10 16:36:21 -06:00

4 Commits