10 Commits

Author SHA1 Message Date
19bdeddcdf 📝 Update README: 40 tools, v2.0.7 table features, token management
Some checks failed
Security Scan / security-scan (push) Has been cancelled
2025-11-08 20:12:40 -07:00
3327137536 🚀 v2.0.5: Fix page range parsing across all PDF tools
Major architectural improvements and bug fixes in the v2.0.x series:

## v2.0.5 - Page Range Parsing (Current Release)
- Fix page range parsing bug affecting 6 mixins (e.g., "93-95" or "11-30")
- Create shared parse_pages_parameter() utility function
- Support mixed formats: "1,3-5,7,10-15"
- Update: pdf_utilities, content_analysis, image_processing, misc_tools, table_extraction, text_extraction

## v2.0.4 - Chunk Hint Fix
- Fix next_chunk_hint to show correct page ranges
- Dynamic calculation based on actual pages being extracted
- Example: "30-50" now correctly shows "40-49" for next chunk

## v2.0.3 - Initial Range Support
- Add page range support to text extraction ("11-30")
- Fix _parse_pages_parameter to handle ranges with Python's range()
- Convert 1-based user input to 0-based internal indexing

## v2.0.2 - Lazy Import Fix
- Fix ModuleNotFoundError for reportlab on startup
- Implement lazy imports for optional dependencies
- Graceful degradation with helpful error messages

## v2.0.1 - Dependency Restructuring
- Move reportlab to optional [forms] extra
- Document installation: uvx --with mcp-pdf[forms] mcp-pdf

## v2.0.0 - Official FastMCP Pattern Migration
- Migrate to official fastmcp.contrib.mcp_mixin pattern
- Create 12 specialized mixins with 42 tools total
- Architecture: mixins_official/ using MCPMixin base class
- Backwards compatibility: server_legacy.py preserved

Technical Improvements:
- Centralized utility functions (DRY principle)
- Consistent behavior across all PDF tools
- Better error messages with actionable instructions
- Library-specific adapters for table extraction

Files Changed:
- New: src/mcp_pdf/mixins_official/utils.py (shared utilities)
- Updated: 6 mixins with improved page parsing
- Version: pyproject.toml, server.py → 2.0.5

PyPI: https://pypi.org/project/mcp-pdf/2.0.5/
2025-11-03 17:12:37 -07:00
856dd41996 Add comprehensive link extraction tool (24th PDF tool)
New Features:
- extract_links: Extract all PDF hyperlinks with advanced filtering
- Page-specific filtering (e.g., "1,3,5" or "1-5,8,10-12")
- Link type categorization: external URLs, internal pages, emails, documents
- Coordinate tracking for precise link positioning
- FastMCP integration with proper tool registration
- Version banner display following CLAUDE.md guidelines

Technical Improvements:
- Enhanced startup banner with package version display
- Updated documentation to reflect 24 specialized tools
- Proper FastMCP @mcp.tool() decorator usage
- Comprehensive error handling and security validation

Documentation Updates:
- README.md: Updated tool count and installation guides
- CLAUDE.md: Added link extraction to implemented features
- LOCAL_DEVELOPMENT.md: Enhanced with scoped installation commands

Version: 1.1.0 (minor version bump for new feature)
2025-09-23 20:41:16 -06:00
8d01c44d4f 🚀 Rename to mcp-pdf and prepare for PyPI publication
**Package Rebranding:**
- Renamed package from mcp-pdf-tools to mcp-pdf (cleaner name)
- Updated version to 1.0.0 (production ready with security hardening)
- Updated all import paths and references throughout codebase

**PyPI Preparation:**
- Enhanced package description and metadata
- Added proper project URLs and homepage
- Updated CLI command from mcp-pdf-tools to mcp-pdf
- Built distribution packages (wheel + source)

**Testing & Validation:**
- All 20 security tests pass with new package structure
- Local installation and import tests successful
- CLI command working correctly
- Package ready for PyPI publication

The secure, production-ready PDF processing platform is now ready
for public distribution and installation via pip.

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-09-06 15:42:59 -06:00
10ef5028eb 📖 Add Claude Code integration command to documentation
Feature prominent Claude Code integration instructions:
- Add recommended one-line command for Claude Code users
- Update installation section with uvx commands
- Include git.supported.systems repository URLs
- Highlight seamless AI-powered document processing integration

Command for Claude Code users:
claude mcp add -s local -- legacy-files uvx --from git+https://git.supported.systems/MCP/mcp-legacy-files.git mcp-legacy-files

This enables direct access to all 9 vintage format processors within Claude Code
for seamless AI-enhanced document processing workflows.

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-08-18 23:11:28 -06:00
78a8c40e71 Transform README into comprehensive project showcase
**Major Enhancement**: Combined blog post storytelling with technical documentation
to create an engaging, comprehensive project showcase.

**What's New:**
📖 **Compelling Narrative**: Tells the complete story from 8 tools → 23 tools
🎯 **Real-World Examples**: Business intelligence, academic research, security workflows
🧠 **Technical Deep-Dives**: Architecture decisions, intelligent fallbacks, UX design
 **Performance Insights**: Async architecture, caching strategies, resource management
🔧 **Complete Documentation**: Installation, usage, troubleshooting, contributing

**Key Sections Added:**
- "What We Built" - Project overview and use cases
- "Key Innovations" - Document intelligence, layout processing, web integration
- "Real-World Usage Examples" - 4 comprehensive workflow examples
- "Performance & Architecture" - Technical implementation details
- "Architecture Deep-Dive" - Code examples and design decisions
- "Why MCP PDF Tools?" - Value proposition and differentiators

**Impact**:
- Much more engaging for new users and contributors
- Showcases the full scope of capabilities (23 tools\!)
- Provides clear guidance for different use cases
- Demonstrates technical sophistication and quality
- Perfect for sharing, contributing, and adoption

Now developers can understand not just HOW to use the tools, but WHY this
project exists and what makes it special in the PDF processing landscape.

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-08-12 08:40:59 -06:00
f601d44d99 Fix page numbering: Switch to user-friendly 1-based indexing
**Problem**: Zero-based page numbers were confusing for users who naturally
think of pages starting from 1.

**Solution**:
- Updated `parse_pages_parameter()` to convert 1-based user input to 0-based internal representation
- All user-facing documentation now uses 1-based page numbering (page 1 = first page)
- Internal processing continues to use 0-based indexing for PyMuPDF compatibility
- Output page numbers are consistently displayed as 1-based for users

**Changes**:
- Enhanced documentation strings to clarify "1-based" page numbering
- Updated README examples with 1-based page numbers and clarifying comments
- Fixed split_pdf function to handle 1-based input correctly
- Updated test cases to verify 1-based -> 0-based conversion
- Added feature highlight: "User-Friendly: All page numbers use 1-based indexing"

**Impact**: Much more intuitive for users - no more confusion about which page is "page 0"\!

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-08-11 04:32:20 -06:00
f0365a0d75 Implement comprehensive PDF processing suite with 15 additional advanced tools
Major expansion from 8 to 23 total tools covering:

**Document Analysis & Intelligence:**
- analyze_pdf_health: Comprehensive quality and health analysis
- analyze_pdf_security: Security features and vulnerability assessment
- classify_content: AI-powered document type classification
- summarize_content: Intelligent content summarization with key insights
- compare_pdfs: Advanced document comparison (text, structure, metadata)

**Layout & Visual Analysis:**
- analyze_layout: Page layout analysis with column detection
- extract_charts: Chart, diagram, and visual element extraction
- detect_watermarks: Watermark detection and analysis

**Content Manipulation:**
- extract_form_data: Interactive PDF form data extraction
- split_pdf: Split PDFs at specified pages
- merge_pdfs: Merge multiple PDFs into one
- rotate_pages: Rotate pages by 90°/180°/270°

**Optimization & Utilities:**
- convert_to_images: Convert PDF pages to image files
- optimize_pdf: File size optimization with quality levels
- repair_pdf: Corrupted PDF repair and recovery

**Technical Enhancements:**
- All tools support HTTPS URLs with intelligent caching
- Fixed MCP parameter validation for pages parameter
- Comprehensive error handling and validation
- Updated documentation with usage examples

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-08-11 04:27:04 -06:00
58d43851b9 Add HTTPS URL support and fix MCP parameter validation
Features:
- HTTPS URL support: Process PDFs directly from URLs with intelligent caching
- Smart caching: 1-hour cache to avoid repeated downloads
- Content validation: Verify downloads are actually PDF files
- Security: Proper User-Agent headers, HTTPS preferred over HTTP
- MCP parameter fixes: Handle pages parameter as string "[2,3]" format
- Backward compatibility: Still supports local file paths and list parameters

Technical changes:
- Added download_pdf_from_url() with caching and validation
- Updated validate_pdf_path() to handle URLs and local paths
- Added parse_pages_parameter() for flexible parameter parsing
- Updated all 8 tools to accept string pages parameters
- Enhanced error handling for network and validation issues

All tools now support:
- Local paths: "/path/to/file.pdf"
- HTTPS URLs: "https://example.com/document.pdf"
- Flexible pages: "[2,3]", "1,2,3", or [1,2,3]

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-08-11 02:25:53 -06:00
c902e81e4d Initial commit: Complete MCP PDF Tools server implementation
Features:
- 8 comprehensive PDF processing tools with intelligent fallbacks
- Text extraction (PyMuPDF, pdfplumber, pypdf with auto-selection)
- Table extraction (Camelot → pdfplumber → Tabula fallback chain)
- OCR processing with Tesseract and preprocessing options
- Document analysis (structure, metadata, scanned detection)
- Image extraction with filtering capabilities
- PDF to markdown conversion with metadata
- Built on FastMCP framework with full MCP protocol support
- Comprehensive error handling and user-friendly messages
- Docker support and cross-platform compatibility
- Complete test suite and examples

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-08-10 16:36:21 -06:00