Ryan Malloy f601d44d99 Fix page numbering: Switch to user-friendly 1-based indexing

**Problem**: Zero-based page numbers were confusing for users who naturally
think of pages starting from 1.

**Solution**:
- Updated `parse_pages_parameter()` to convert 1-based user input to 0-based internal representation
- All user-facing documentation now uses 1-based page numbering (page 1 = first page)
- Internal processing continues to use 0-based indexing for PyMuPDF compatibility
- Output page numbers are consistently displayed as 1-based for users

**Changes**:
- Enhanced documentation strings to clarify "1-based" page numbering
- Updated README examples with 1-based page numbers and clarifying comments
- Fixed split_pdf function to handle 1-based input correctly
- Updated test cases to verify 1-based -> 0-based conversion
- Added feature highlight: "User-Friendly: All page numbers use 1-based indexing"

**Impact**: Much more intuitive for users - no more confusion about which page is "page 0"\!

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>

2025-08-11 04:32:20 -06:00

2.3 KiB

Raw Permalink Blame History

Claude Desktop MCP Configuration

This document explains how the MCP PDF Tools server has been configured for Claude Desktop.

Configuration Location

The MCP configuration has been added to:

/home/rpm/.config/Claude/claude_desktop_config.json

PDF Tools Server Configuration

The following configuration has been added to your Claude Desktop:

{
  "mcpServers": {
    "pdf-tools": {
      "command": "uv",
      "args": [
        "--directory",
        "/home/rpm/claude/mcp-pdf-tools",
        "run",
        "mcp-pdf-tools"
      ],
      "env": {
        "PDF_TEMP_DIR": "/tmp/mcp-pdf-processing"
      }
    }
  }
}

What This Enables

With this configuration, all your Claude sessions will have access to:

extract_text: Extract text from PDFs with multiple method support
extract_tables: Extract tables from PDFs with intelligent fallbacks
extract_images: Extract and filter images from PDFs
extract_metadata: Get comprehensive PDF metadata and file information
get_document_structure: Analyze PDF structure, outline, and fonts
is_scanned_pdf: Detect if PDFs are scanned/image-based
ocr_pdf: Perform OCR on scanned PDFs with preprocessing
pdf_to_markdown: Convert PDFs to clean markdown format

Environment Variables

PDF_TEMP_DIR: Set to /tmp/mcp-pdf-processing for temporary file processing

Backup

A backup of your original configuration has been saved to:

/home/rpm/.config/Claude/claude_desktop_config.json.backup

Testing

The server has been tested and is working correctly. You can verify it's available in new Claude sessions by checking for the mcp__pdf-tools__* functions.

Troubleshooting

If you encounter issues:

Server not starting: Check that all dependencies are installed:
```
cd /home/rpm/claude/mcp-pdf-tools
uv sync --dev
```

System dependencies missing: Install required packages:

sudo apt-get install tesseract-ocr tesseract-ocr-eng poppler-utils ghostscript python3-tk default-jre-headless

Permission issues: Ensure temp directory exists:

mkdir -p /tmp/mcp-pdf-processing
chmod 755 /tmp/mcp-pdf-processing

Test server manually:

cd /home/rpm/claude/mcp-pdf-tools
uv run mcp-pdf-tools --help

2.3 KiB Raw Permalink Blame History