mcp-pdf-tools

MCP/mcp-pdf-tools

Fork 0

Commit Graph

Author	SHA1	Message	Date
Ryan Malloy	f0365a0d75	Implement comprehensive PDF processing suite with 15 additional advanced tools Major expansion from 8 to 23 total tools covering: Document Analysis & Intelligence: - analyze_pdf_health: Comprehensive quality and health analysis - analyze_pdf_security: Security features and vulnerability assessment - classify_content: AI-powered document type classification - summarize_content: Intelligent content summarization with key insights - compare_pdfs: Advanced document comparison (text, structure, metadata) Layout & Visual Analysis: - analyze_layout: Page layout analysis with column detection - extract_charts: Chart, diagram, and visual element extraction - detect_watermarks: Watermark detection and analysis Content Manipulation: - extract_form_data: Interactive PDF form data extraction - split_pdf: Split PDFs at specified pages - merge_pdfs: Merge multiple PDFs into one - rotate_pages: Rotate pages by 90°/180°/270° Optimization & Utilities: - convert_to_images: Convert PDF pages to image files - optimize_pdf: File size optimization with quality levels - repair_pdf: Corrupted PDF repair and recovery Technical Enhancements: - All tools support HTTPS URLs with intelligent caching - Fixed MCP parameter validation for pages parameter - Comprehensive error handling and validation - Updated documentation with usage examples 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-08-11 04:27:04 -06:00
Ryan Malloy	58d43851b9	Add HTTPS URL support and fix MCP parameter validation Features: - HTTPS URL support: Process PDFs directly from URLs with intelligent caching - Smart caching: 1-hour cache to avoid repeated downloads - Content validation: Verify downloads are actually PDF files - Security: Proper User-Agent headers, HTTPS preferred over HTTP - MCP parameter fixes: Handle pages parameter as string "[2,3]" format - Backward compatibility: Still supports local file paths and list parameters Technical changes: - Added download_pdf_from_url() with caching and validation - Updated validate_pdf_path() to handle URLs and local paths - Added parse_pages_parameter() for flexible parameter parsing - Updated all 8 tools to accept string pages parameters - Enhanced error handling for network and validation issues All tools now support: - Local paths: "/path/to/file.pdf" - HTTPS URLs: "https://example.com/document.pdf" - Flexible pages: "[2,3]", "1,2,3", or [1,2,3] 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-08-11 02:25:53 -06:00
Ryan Malloy	c902e81e4d	Initial commit: Complete MCP PDF Tools server implementation Features: - 8 comprehensive PDF processing tools with intelligent fallbacks - Text extraction (PyMuPDF, pdfplumber, pypdf with auto-selection) - Table extraction (Camelot → pdfplumber → Tabula fallback chain) - OCR processing with Tesseract and preprocessing options - Document analysis (structure, metadata, scanned detection) - Image extraction with filtering capabilities - PDF to markdown conversion with metadata - Built on FastMCP framework with full MCP protocol support - Comprehensive error handling and user-friendly messages - Docker support and cross-platform compatibility - Complete test suite and examples 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-08-10 16:36:21 -06:00

Author

SHA1

Message

Date

Ryan Malloy

f0365a0d75

Implement comprehensive PDF processing suite with 15 additional advanced tools

Major expansion from 8 to 23 total tools covering:

**Document Analysis & Intelligence:**
- analyze_pdf_health: Comprehensive quality and health analysis
- analyze_pdf_security: Security features and vulnerability assessment
- classify_content: AI-powered document type classification
- summarize_content: Intelligent content summarization with key insights
- compare_pdfs: Advanced document comparison (text, structure, metadata)

**Layout & Visual Analysis:**
- analyze_layout: Page layout analysis with column detection
- extract_charts: Chart, diagram, and visual element extraction
- detect_watermarks: Watermark detection and analysis

**Content Manipulation:**
- extract_form_data: Interactive PDF form data extraction
- split_pdf: Split PDFs at specified pages
- merge_pdfs: Merge multiple PDFs into one
- rotate_pages: Rotate pages by 90°/180°/270°

**Optimization & Utilities:**
- convert_to_images: Convert PDF pages to image files
- optimize_pdf: File size optimization with quality levels
- repair_pdf: Corrupted PDF repair and recovery

**Technical Enhancements:**
- All tools support HTTPS URLs with intelligent caching
- Fixed MCP parameter validation for pages parameter
- Comprehensive error handling and validation
- Updated documentation with usage examples

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>

2025-08-11 04:27:04 -06:00

Ryan Malloy

58d43851b9

Add HTTPS URL support and fix MCP parameter validation

Features:
- HTTPS URL support: Process PDFs directly from URLs with intelligent caching
- Smart caching: 1-hour cache to avoid repeated downloads
- Content validation: Verify downloads are actually PDF files
- Security: Proper User-Agent headers, HTTPS preferred over HTTP
- MCP parameter fixes: Handle pages parameter as string "[2,3]" format
- Backward compatibility: Still supports local file paths and list parameters

Technical changes:
- Added download_pdf_from_url() with caching and validation
- Updated validate_pdf_path() to handle URLs and local paths
- Added parse_pages_parameter() for flexible parameter parsing
- Updated all 8 tools to accept string pages parameters
- Enhanced error handling for network and validation issues

All tools now support:
- Local paths: "/path/to/file.pdf"
- HTTPS URLs: "https://example.com/document.pdf"
- Flexible pages: "[2,3]", "1,2,3", or [1,2,3]

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>

2025-08-11 02:25:53 -06:00

Ryan Malloy

c902e81e4d

Initial commit: Complete MCP PDF Tools server implementation

Features:
- 8 comprehensive PDF processing tools with intelligent fallbacks
- Text extraction (PyMuPDF, pdfplumber, pypdf with auto-selection)
- Table extraction (Camelot → pdfplumber → Tabula fallback chain)
- OCR processing with Tesseract and preprocessing options
- Document analysis (structure, metadata, scanned detection)
- Image extraction with filtering capabilities
- PDF to markdown conversion with metadata
- Built on FastMCP framework with full MCP protocol support
- Comprehensive error handling and user-friendly messages
- Docker support and cross-platform compatibility
- Complete test suite and examples

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>

2025-08-10 16:36:21 -06:00

3 Commits