Ryan Malloy c902e81e4d Initial commit: Complete MCP PDF Tools server implementation

Features:
- 8 comprehensive PDF processing tools with intelligent fallbacks
- Text extraction (PyMuPDF, pdfplumber, pypdf with auto-selection)
- Table extraction (Camelot → pdfplumber → Tabula fallback chain)
- OCR processing with Tesseract and preprocessing options
- Document analysis (structure, metadata, scanned detection)
- Image extraction with filtering capabilities
- PDF to markdown conversion with metadata
- Built on FastMCP framework with full MCP protocol support
- Comprehensive error handling and user-friendly messages
- Docker support and cross-platform compatibility
- Complete test suite and examples

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>

2025-08-10 16:36:21 -06:00

2.5 KiB

Raw Blame History

Quick Start Guide

1. Installation

Option A: Using UV (Recommended for Development)

# Clone the repository
git clone https://github.com/rpm/mcp-pdf-tools
cd mcp-pdf-tools

# Install with uv
uv sync

# Verify installation
uv run python examples/verify_installation.py

Option B: Using Docker

# Clone the repository
git clone https://github.com/rpm/mcp-pdf-tools
cd mcp-pdf-tools

# Build and run with Docker
docker-compose build
docker-compose run --rm mcp-pdf-tools python examples/verify_installation.py

Option C: From PyPI

pip install mcp-pdf-tools

2. System Dependencies

Ubuntu/Debian

sudo apt-get update
sudo apt-get install -y \
    tesseract-ocr \
    tesseract-ocr-eng \
    poppler-utils \
    ghostscript \
    python3-tk \
    default-jre-headless

macOS

brew install tesseract poppler ghostscript

Windows

Install Tesseract: https://github.com/UB-Mannheim/tesseract/wiki
Install Poppler: http://blog.alivate.com.au/poppler-windows/
Install Ghostscript: https://www.ghostscript.com/download/gsdnld.html
Install Java: https://www.java.com/download/

3. Claude Desktop Configuration

Add to ~/Library/Application Support/Claude/claude_desktop_config.json:

{
  "mcpServers": {
    "pdf-tools": {
      "command": "uv",
      "args": ["run", "mcp-pdf-tools"],
      "cwd": "/home/rpm/claude/mcp-pdf-tools"
    }
  }
}

4. Test the Tools

# Test with a sample PDF
uv run python examples/test_pdf_tools.py /path/to/your/document.pdf

5. Common Issues

OCR not working

Check Tesseract is installed: tesseract --version
Install language packs: sudo apt-get install tesseract-ocr-[lang]

Table extraction failing

Check Java is installed: java -version
For Camelot issues, ensure Ghostscript is installed

Large PDF issues

Process specific pages: pages=[0, 1, 2]
Increase memory: export JAVA_OPTS="-Xmx2g"

6. Example Usage in Claude

Once configured, you can ask Claude:

"Extract text from the PDF at /path/to/document.pdf"
"Check if /path/to/scan.pdf is a scanned document"
"Extract all tables from /path/to/report.pdf and format as markdown"
"Convert /path/to/document.pdf to markdown format"
"Extract images from the first 5 pages of /path/to/presentation.pdf"

Need Help?

Check the full README.md for detailed documentation
Run tests: uv run pytest
Enable debug mode: Set DEBUG=true in your .env file

2.5 KiB Raw Blame History

Quick Start Guide

1. Installation

Option A: Using UV (Recommended for Development)

Option B: Using Docker

Option C: From PyPI

2. System Dependencies

Ubuntu/Debian

macOS

Windows

3. Claude Desktop Configuration

4. Test the Tools

5. Common Issues

OCR not working

Table extraction failing

Large PDF issues

6. Example Usage in Claude

Need Help?

2.5 KiB

Raw Blame History