mcp-pdf-tools/QUICKSTART.md

# Quick Start Guide

## 1. Installation

### Option A: Using UV (Recommended for Development)

```bash
# Clone the repository
git clone https://github.com/rpm/mcp-pdf-tools
cd mcp-pdf-tools

# Install with uv
uv sync

# Verify installation
uv run python examples/verify_installation.py
```

### Option B: Using Docker

```bash
# Clone the repository
git clone https://github.com/rpm/mcp-pdf-tools
cd mcp-pdf-tools

# Build and run with Docker
docker-compose build
docker-compose run --rm mcp-pdf-tools python examples/verify_installation.py
```

### Option C: From PyPI

```bash
pip install mcp-pdf-tools
```

## 2. System Dependencies

### Ubuntu/Debian
```bash
sudo apt-get update
sudo apt-get install -y \
    tesseract-ocr \
    tesseract-ocr-eng \
    poppler-utils \
    ghostscript \
    python3-tk \
    default-jre-headless
```

### macOS
```bash
brew install tesseract poppler ghostscript
```

### Windows
- Install Tesseract: https://github.com/UB-Mannheim/tesseract/wiki
- Install Poppler: http://blog.alivate.com.au/poppler-windows/
- Install Ghostscript: https://www.ghostscript.com/download/gsdnld.html
- Install Java: https://www.java.com/download/

## 3. Claude Desktop Configuration

Add to `~/Library/Application Support/Claude/claude_desktop_config.json`:

```json
{
  "mcpServers": {
    "pdf-tools": {
      "command": "uv",
      "args": ["run", "mcp-pdf-tools"],
      "cwd": "/home/rpm/claude/mcp-pdf-tools"
    }
  }
}
```

## 4. Test the Tools

```bash
# Test with a sample PDF
uv run python examples/test_pdf_tools.py /path/to/your/document.pdf
```

## 5. Common Issues

### OCR not working
- Check Tesseract is installed: `tesseract --version`
- Install language packs: `sudo apt-get install tesseract-ocr-[lang]`

### Table extraction failing
- Check Java is installed: `java -version`
- For Camelot issues, ensure Ghostscript is installed

### Large PDF issues
- Process specific pages: `pages=[0, 1, 2]`
- Increase memory: `export JAVA_OPTS="-Xmx2g"`

## 6. Example Usage in Claude

Once configured, you can ask Claude:

- "Extract text from the PDF at /path/to/document.pdf"
- "Check if /path/to/scan.pdf is a scanned document"
- "Extract all tables from /path/to/report.pdf and format as markdown"
- "Convert /path/to/document.pdf to markdown format"
- "Extract images from the first 5 pages of /path/to/presentation.pdf"

## Need Help?

- Check the full README.md for detailed documentation
- Run tests: `uv run pytest`
- Enable debug mode: Set `DEBUG=true` in your .env file