mcp-pdf-tools/LOCAL_DEVELOPMENT.md
Ryan Malloy 856dd41996 Add comprehensive link extraction tool (24th PDF tool)
New Features:
- extract_links: Extract all PDF hyperlinks with advanced filtering
- Page-specific filtering (e.g., "1,3,5" or "1-5,8,10-12")
- Link type categorization: external URLs, internal pages, emails, documents
- Coordinate tracking for precise link positioning
- FastMCP integration with proper tool registration
- Version banner display following CLAUDE.md guidelines

Technical Improvements:
- Enhanced startup banner with package version display
- Updated documentation to reflect 24 specialized tools
- Proper FastMCP @mcp.tool() decorator usage
- Comprehensive error handling and security validation

Documentation Updates:
- README.md: Updated tool count and installation guides
- CLAUDE.md: Added link extraction to implemented features
- LOCAL_DEVELOPMENT.md: Enhanced with scoped installation commands

Version: 1.1.0 (minor version bump for new feature)
2025-09-23 20:41:16 -06:00

4.8 KiB

🔧 Local Development Guide for MCP PDF

This guide shows how to test MCP PDF locally during development before publishing to PyPI.

📋 Prerequisites

  • Python 3.10+
  • uv package manager
  • Claude Desktop app
  • Git repository cloned locally

🚀 Quick Start for Local Testing

1. Clone and Setup

# Clone the repository
git clone https://github.com/rsp2k/mcp-pdf.git
cd mcp-pdf

# Install dependencies
uv sync --dev

# Verify installation
uv run python -c "from mcp_pdf.server import create_server; print('✅ MCP PDF loads successfully')"

2. Add MCP Server to Claude Desktop

For Production Use (PyPI Installation)

Install the published version from PyPI:

# For personal use across all projects
claude mcp add -s local pdf-tools uvx mcp-pdf

# For project-specific use (isolated to current directory)
claude mcp add -s project pdf-tools uvx mcp-pdf

For Local Development (Source Installation)

When developing MCP PDF itself, use the local source:

# For development from local source
claude mcp add -s project pdf-tools-dev uv -- --directory /path/to/mcp-pdf-tools run mcp-pdf

Or if you're in the mcp-pdf directory:

# Development server from current directory
claude mcp add -s project pdf-tools-dev uv -- --directory . run mcp-pdf

3. Alternative: Manual Server Testing

You can also run the server manually for debugging:

# Run the MCP server directly
uv run mcp-pdf

# Or run with specific FastMCP options
uv run python -m mcp_pdf.server

4. Test Core Functionality

Once connected to Claude Code, test these key features:

Basic PDF Processing

"Extract text from this PDF file: /path/to/test.pdf"
"Get metadata from this PDF: /path/to/document.pdf"
"Check if this PDF is scanned: /path/to/scan.pdf"

Security Features

"Try to extract text from a very large PDF"
"Process a PDF with 2000 pages" (should be limited to 1000)

Advanced Features

"Extract tables from this PDF: /path/to/tables.pdf"
"Convert this PDF to markdown: /path/to/document.pdf"
"Add annotations to this PDF: /path/to/target.pdf"

🔒 Security Testing

Verify the security hardening works:

File Size Limits

  • Try processing a PDF larger than 100MB
  • Should see: "PDF file too large: X bytes > 104857600"

Page Count Limits

  • Try processing a PDF with >1000 pages
  • Should see: "PDF too large for processing: X pages > 1000"

Path Traversal Protection

  • Test with malicious paths like ../../../etc/passwd
  • Should be blocked with security error

JSON Input Validation

  • Large JSON inputs (>10KB) should be rejected
  • Malformed JSON should return clean error messages

🐛 Debugging

Enable Debug Logging

export DEBUG=true
uv run mcp-pdf

Check Security Functions

# Test security validation functions
uv run python test_security_features.py

# Run integration tests
uv run python test_integration.py

Verify Package Structure

# Check package builds correctly
uv build

# Verify package metadata
uv run twine check dist/*

📊 Testing Checklist

Before publishing, verify:

  • All 23 PDF tools work correctly
  • Security limits are enforced (file size, page count)
  • Error messages are clean and helpful
  • No sensitive information leaked in errors
  • Path traversal protection works
  • JSON input validation works
  • Memory limits prevent crashes
  • CLI command mcp-pdf works
  • Package imports correctly: from mcp_pdf.server import create_server

🚀 Publishing Pipeline

Once local testing passes:

  1. Version Bump: Update version in pyproject.toml
  2. Build: uv build
  3. Test Upload: uv run twine upload --repository testpypi dist/*
  4. Test Install: pip install -i https://test.pypi.org/simple/ mcp-pdf
  5. Production Upload: uv run twine upload dist/*

🔧 Development Commands

# Format code
uv run black src/ tests/

# Lint code  
uv run ruff check src/ tests/

# Run tests
uv run pytest

# Security scan
uv run pip-audit

# Build package
uv build

# Install editable for development
pip install -e .  # (in a venv)

🆘 Troubleshooting

"Module not found" errors

  • Ensure you're in the right directory
  • Run uv sync to install dependencies
  • Check Python path with uv run python -c "import sys; print(sys.path)"

MCP server won't start

  • Check that all system dependencies are installed (tesseract, java, ghostscript)
  • Verify with: uv run python examples/verify_installation.py

Security tests fail

  • Run uv run python test_security_features.py -v for detailed output
  • Check that security constants are properly set

This setup allows for rapid development and testing without polluting your system Python or needing to publish to PyPI for every change.