New Features: - extract_links: Extract all PDF hyperlinks with advanced filtering - Page-specific filtering (e.g., "1,3,5" or "1-5,8,10-12") - Link type categorization: external URLs, internal pages, emails, documents - Coordinate tracking for precise link positioning - FastMCP integration with proper tool registration - Version banner display following CLAUDE.md guidelines Technical Improvements: - Enhanced startup banner with package version display - Updated documentation to reflect 24 specialized tools - Proper FastMCP @mcp.tool() decorator usage - Comprehensive error handling and security validation Documentation Updates: - README.md: Updated tool count and installation guides - CLAUDE.md: Added link extraction to implemented features - LOCAL_DEVELOPMENT.md: Enhanced with scoped installation commands Version: 1.1.0 (minor version bump for new feature)
4.8 KiB
4.8 KiB
🔧 Local Development Guide for MCP PDF
This guide shows how to test MCP PDF locally during development before publishing to PyPI.
📋 Prerequisites
- Python 3.10+
- uv package manager
- Claude Desktop app
- Git repository cloned locally
🚀 Quick Start for Local Testing
1. Clone and Setup
# Clone the repository
git clone https://github.com/rsp2k/mcp-pdf.git
cd mcp-pdf
# Install dependencies
uv sync --dev
# Verify installation
uv run python -c "from mcp_pdf.server import create_server; print('✅ MCP PDF loads successfully')"
2. Add MCP Server to Claude Desktop
For Production Use (PyPI Installation)
Install the published version from PyPI:
# For personal use across all projects
claude mcp add -s local pdf-tools uvx mcp-pdf
# For project-specific use (isolated to current directory)
claude mcp add -s project pdf-tools uvx mcp-pdf
For Local Development (Source Installation)
When developing MCP PDF itself, use the local source:
# For development from local source
claude mcp add -s project pdf-tools-dev uv -- --directory /path/to/mcp-pdf-tools run mcp-pdf
Or if you're in the mcp-pdf directory:
# Development server from current directory
claude mcp add -s project pdf-tools-dev uv -- --directory . run mcp-pdf
3. Alternative: Manual Server Testing
You can also run the server manually for debugging:
# Run the MCP server directly
uv run mcp-pdf
# Or run with specific FastMCP options
uv run python -m mcp_pdf.server
4. Test Core Functionality
Once connected to Claude Code, test these key features:
Basic PDF Processing
"Extract text from this PDF file: /path/to/test.pdf"
"Get metadata from this PDF: /path/to/document.pdf"
"Check if this PDF is scanned: /path/to/scan.pdf"
Security Features
"Try to extract text from a very large PDF"
"Process a PDF with 2000 pages" (should be limited to 1000)
Advanced Features
"Extract tables from this PDF: /path/to/tables.pdf"
"Convert this PDF to markdown: /path/to/document.pdf"
"Add annotations to this PDF: /path/to/target.pdf"
🔒 Security Testing
Verify the security hardening works:
File Size Limits
- Try processing a PDF larger than 100MB
- Should see: "PDF file too large: X bytes > 104857600"
Page Count Limits
- Try processing a PDF with >1000 pages
- Should see: "PDF too large for processing: X pages > 1000"
Path Traversal Protection
- Test with malicious paths like
../../../etc/passwd - Should be blocked with security error
JSON Input Validation
- Large JSON inputs (>10KB) should be rejected
- Malformed JSON should return clean error messages
🐛 Debugging
Enable Debug Logging
export DEBUG=true
uv run mcp-pdf
Check Security Functions
# Test security validation functions
uv run python test_security_features.py
# Run integration tests
uv run python test_integration.py
Verify Package Structure
# Check package builds correctly
uv build
# Verify package metadata
uv run twine check dist/*
📊 Testing Checklist
Before publishing, verify:
- All 23 PDF tools work correctly
- Security limits are enforced (file size, page count)
- Error messages are clean and helpful
- No sensitive information leaked in errors
- Path traversal protection works
- JSON input validation works
- Memory limits prevent crashes
- CLI command
mcp-pdfworks - Package imports correctly:
from mcp_pdf.server import create_server
🚀 Publishing Pipeline
Once local testing passes:
- Version Bump: Update version in
pyproject.toml - Build:
uv build - Test Upload:
uv run twine upload --repository testpypi dist/* - Test Install:
pip install -i https://test.pypi.org/simple/ mcp-pdf - Production Upload:
uv run twine upload dist/*
🔧 Development Commands
# Format code
uv run black src/ tests/
# Lint code
uv run ruff check src/ tests/
# Run tests
uv run pytest
# Security scan
uv run pip-audit
# Build package
uv build
# Install editable for development
pip install -e . # (in a venv)
🆘 Troubleshooting
"Module not found" errors
- Ensure you're in the right directory
- Run
uv syncto install dependencies - Check Python path with
uv run python -c "import sys; print(sys.path)"
MCP server won't start
- Check that all system dependencies are installed (tesseract, java, ghostscript)
- Verify with:
uv run python examples/verify_installation.py
Security tests fail
- Run
uv run python test_security_features.py -vfor detailed output - Check that security constants are properly set
This setup allows for rapid development and testing without polluting your system Python or needing to publish to PyPI for every change.