BREAKING ISSUE FIXED:
- Users reported "Output path not allowed: images" error
- extract_images tool was rejecting relative paths due to overly restrictive security
NEW SECURITY MODEL:
- MCP_PDF_ALLOWED_PATHS environment variable controls allowed output directories
- If unset: Allows any directory with "security theater" warnings
- If set: Restricts outputs to specified colon-separated paths
- Cross-platform compatible (: on Unix, ; on Windows)
SECURITY PHILOSOPHY ENHANCED:
- "TRUST NO ONE" - honest about application-level security limitations
- Clear warnings that this is "security theater"
- Emphasis on OS-level permissions and process isolation
- Educational guidance on real security practices
TECHNICAL CHANGES:
- validate_output_path() rewritten with environment variable control
- Path validation uses relative_to() for proper containment checking
- Enhanced warning messages with security education
- Updated documentation with honest security assessment
DOCUMENTATION UPDATES:
- Added MCP_PDF_ALLOWED_PATHS to configuration section
- New "REAL Security" section with OS-level recommendations
- Clear explanation of security theater vs actual protection
Version: 1.1.1 (patch version for critical bugfix)
New Features:
- extract_links: Extract all PDF hyperlinks with advanced filtering
- Page-specific filtering (e.g., "1,3,5" or "1-5,8,10-12")
- Link type categorization: external URLs, internal pages, emails, documents
- Coordinate tracking for precise link positioning
- FastMCP integration with proper tool registration
- Version banner display following CLAUDE.md guidelines
Technical Improvements:
- Enhanced startup banner with package version display
- Updated documentation to reflect 24 specialized tools
- Proper FastMCP @mcp.tool() decorator usage
- Comprehensive error handling and security validation
Documentation Updates:
- README.md: Updated tool count and installation guides
- CLAUDE.md: Added link extraction to implemented features
- LOCAL_DEVELOPMENT.md: Enhanced with scoped installation commands
Version: 1.1.0 (minor version bump for new feature)
**Package Rebranding:**
- Renamed package from mcp-pdf-tools to mcp-pdf (cleaner name)
- Updated version to 1.0.0 (production ready with security hardening)
- Updated all import paths and references throughout codebase
**PyPI Preparation:**
- Enhanced package description and metadata
- Added proper project URLs and homepage
- Updated CLI command from mcp-pdf-tools to mcp-pdf
- Built distribution packages (wheel + source)
**Testing & Validation:**
- All 20 security tests pass with new package structure
- Local installation and import tests successful
- CLI command working correctly
- Package ready for PyPI publication
The secure, production-ready PDF processing platform is now ready
for public distribution and installation via pip.
🤖 Generated with [Claude Code](https://claude.ai/code)
Co-Authored-By: Claude <noreply@anthropic.com>
Implement complete collaboration toolkit with:
- add_sticky_notes: Comment annotations with color support
- add_highlights: Text highlighting with 8 color options
- add_stamps: Approval stamps (APPROVED, DRAFT, CONFIDENTIAL, etc.)
- extract_all_annotations: Export to JSON/CSV formats
Also includes document assembly features:
- merge_pdfs_advanced: Combine PDFs with bookmark preservation
- split_pdf_by_pages: Extract specific page ranges
- split_pdf_by_bookmarks: Auto-split by chapters/sections
- reorder_pdf_pages: Rearrange page sequences
All tools tested and working with proper error handling.
- Add complete PDF form lifecycle management
- Create new forms with text, checkbox, dropdown, signature fields
- Fill existing forms with JSON data and optional flattening
- Add fields to existing PDFs with flexible positioning
- Advanced field types: radio groups, textareas, date fields
- Comprehensive validation engine with regex patterns
- Email, phone, number, date format validation
- Required field checking and length constraints
- Visual validation cues with asterisks and format hints
- Multi-field error reporting with detailed feedback
- International character support and edge case handling
- Enterprise-ready for complex business forms
Implement proper MCP resource protocol for image access, eliminating the need
for clients to handle local file paths and enabling seamless image integration.
Key Features:
• MCP Resource Endpoint: pdf-image://{image_id} for direct image access
• extract_images(): Returns resource_uri field with MCP resource links
• pdf_to_markdown(): Embeds resource URIs in markdown image references
• Automatic MIME type detection (image/png, image/jpeg)
• Seamless client integration without file path handling
Benefits:
✅ Direct image access via MCP resource protocol
✅ No local file path dependencies for MCP clients
✅ Proper MIME type handling for image display
✅ Clean markdown with working image links
✅ Standards-compliant MCP resource implementation
Response Format Enhancement:
+ "resource_uri": "pdf-image://page_1_image_0"
+ Works in markdown: \
+ MIME Type: image/png or image/jpeg
+ Direct client access without file system dependencies
This resolves the limitation where extracted images were only available
as local file paths, making them truly accessible to MCP clients
through the standardized resource protocol.
🤖 Generated with [Claude Code](https://claude.ai/code)
Co-Authored-By: Claude <noreply@anthropic.com>
Features:
- 8 comprehensive PDF processing tools with intelligent fallbacks
- Text extraction (PyMuPDF, pdfplumber, pypdf with auto-selection)
- Table extraction (Camelot → pdfplumber → Tabula fallback chain)
- OCR processing with Tesseract and preprocessing options
- Document analysis (structure, metadata, scanned detection)
- Image extraction with filtering capabilities
- PDF to markdown conversion with metadata
- Built on FastMCP framework with full MCP protocol support
- Comprehensive error handling and user-friendly messages
- Docker support and cross-platform compatibility
- Complete test suite and examples
🤖 Generated with [Claude Code](https://claude.ai/code)
Co-Authored-By: Claude <noreply@anthropic.com>