mcp-pdf-tools/QUICKSTART.md
Ryan Malloy c902e81e4d Initial commit: Complete MCP PDF Tools server implementation
Features:
- 8 comprehensive PDF processing tools with intelligent fallbacks
- Text extraction (PyMuPDF, pdfplumber, pypdf with auto-selection)
- Table extraction (Camelot → pdfplumber → Tabula fallback chain)
- OCR processing with Tesseract and preprocessing options
- Document analysis (structure, metadata, scanned detection)
- Image extraction with filtering capabilities
- PDF to markdown conversion with metadata
- Built on FastMCP framework with full MCP protocol support
- Comprehensive error handling and user-friendly messages
- Docker support and cross-platform compatibility
- Complete test suite and examples

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-08-10 16:36:21 -06:00

2.5 KiB

Quick Start Guide

1. Installation

# Clone the repository
git clone https://github.com/rpm/mcp-pdf-tools
cd mcp-pdf-tools

# Install with uv
uv sync

# Verify installation
uv run python examples/verify_installation.py

Option B: Using Docker

# Clone the repository
git clone https://github.com/rpm/mcp-pdf-tools
cd mcp-pdf-tools

# Build and run with Docker
docker-compose build
docker-compose run --rm mcp-pdf-tools python examples/verify_installation.py

Option C: From PyPI

pip install mcp-pdf-tools

2. System Dependencies

Ubuntu/Debian

sudo apt-get update
sudo apt-get install -y \
    tesseract-ocr \
    tesseract-ocr-eng \
    poppler-utils \
    ghostscript \
    python3-tk \
    default-jre-headless

macOS

brew install tesseract poppler ghostscript

Windows

3. Claude Desktop Configuration

Add to ~/Library/Application Support/Claude/claude_desktop_config.json:

{
  "mcpServers": {
    "pdf-tools": {
      "command": "uv",
      "args": ["run", "mcp-pdf-tools"],
      "cwd": "/home/rpm/claude/mcp-pdf-tools"
    }
  }
}

4. Test the Tools

# Test with a sample PDF
uv run python examples/test_pdf_tools.py /path/to/your/document.pdf

5. Common Issues

OCR not working

  • Check Tesseract is installed: tesseract --version
  • Install language packs: sudo apt-get install tesseract-ocr-[lang]

Table extraction failing

  • Check Java is installed: java -version
  • For Camelot issues, ensure Ghostscript is installed

Large PDF issues

  • Process specific pages: pages=[0, 1, 2]
  • Increase memory: export JAVA_OPTS="-Xmx2g"

6. Example Usage in Claude

Once configured, you can ask Claude:

  • "Extract text from the PDF at /path/to/document.pdf"
  • "Check if /path/to/scan.pdf is a scanned document"
  • "Extract all tables from /path/to/report.pdf and format as markdown"
  • "Convert /path/to/document.pdf to markdown format"
  • "Extract images from the first 5 pages of /path/to/presentation.pdf"

Need Help?

  • Check the full README.md for detailed documentation
  • Run tests: uv run pytest
  • Enable debug mode: Set DEBUG=true in your .env file