Features: - 8 comprehensive PDF processing tools with intelligent fallbacks - Text extraction (PyMuPDF, pdfplumber, pypdf with auto-selection) - Table extraction (Camelot → pdfplumber → Tabula fallback chain) - OCR processing with Tesseract and preprocessing options - Document analysis (structure, metadata, scanned detection) - Image extraction with filtering capabilities - PDF to markdown conversion with metadata - Built on FastMCP framework with full MCP protocol support - Comprehensive error handling and user-friendly messages - Docker support and cross-platform compatibility - Complete test suite and examples 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com>
2.5 KiB
2.5 KiB
Quick Start Guide
1. Installation
Option A: Using UV (Recommended for Development)
# Clone the repository
git clone https://github.com/rpm/mcp-pdf-tools
cd mcp-pdf-tools
# Install with uv
uv sync
# Verify installation
uv run python examples/verify_installation.py
Option B: Using Docker
# Clone the repository
git clone https://github.com/rpm/mcp-pdf-tools
cd mcp-pdf-tools
# Build and run with Docker
docker-compose build
docker-compose run --rm mcp-pdf-tools python examples/verify_installation.py
Option C: From PyPI
pip install mcp-pdf-tools
2. System Dependencies
Ubuntu/Debian
sudo apt-get update
sudo apt-get install -y \
tesseract-ocr \
tesseract-ocr-eng \
poppler-utils \
ghostscript \
python3-tk \
default-jre-headless
macOS
brew install tesseract poppler ghostscript
Windows
- Install Tesseract: https://github.com/UB-Mannheim/tesseract/wiki
- Install Poppler: http://blog.alivate.com.au/poppler-windows/
- Install Ghostscript: https://www.ghostscript.com/download/gsdnld.html
- Install Java: https://www.java.com/download/
3. Claude Desktop Configuration
Add to ~/Library/Application Support/Claude/claude_desktop_config.json
:
{
"mcpServers": {
"pdf-tools": {
"command": "uv",
"args": ["run", "mcp-pdf-tools"],
"cwd": "/home/rpm/claude/mcp-pdf-tools"
}
}
}
4. Test the Tools
# Test with a sample PDF
uv run python examples/test_pdf_tools.py /path/to/your/document.pdf
5. Common Issues
OCR not working
- Check Tesseract is installed:
tesseract --version
- Install language packs:
sudo apt-get install tesseract-ocr-[lang]
Table extraction failing
- Check Java is installed:
java -version
- For Camelot issues, ensure Ghostscript is installed
Large PDF issues
- Process specific pages:
pages=[0, 1, 2]
- Increase memory:
export JAVA_OPTS="-Xmx2g"
6. Example Usage in Claude
Once configured, you can ask Claude:
- "Extract text from the PDF at /path/to/document.pdf"
- "Check if /path/to/scan.pdf is a scanned document"
- "Extract all tables from /path/to/report.pdf and format as markdown"
- "Convert /path/to/document.pdf to markdown format"
- "Extract images from the first 5 pages of /path/to/presentation.pdf"
Need Help?
- Check the full README.md for detailed documentation
- Run tests:
uv run pytest
- Enable debug mode: Set
DEBUG=true
in your .env file