Features: - 8 comprehensive PDF processing tools with intelligent fallbacks - Text extraction (PyMuPDF, pdfplumber, pypdf with auto-selection) - Table extraction (Camelot → pdfplumber → Tabula fallback chain) - OCR processing with Tesseract and preprocessing options - Document analysis (structure, metadata, scanned detection) - Image extraction with filtering capabilities - PDF to markdown conversion with metadata - Built on FastMCP framework with full MCP protocol support - Comprehensive error handling and user-friendly messages - Docker support and cross-platform compatibility - Complete test suite and examples 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com>
21 lines
533 B
YAML
21 lines
533 B
YAML
version: '3.8'
|
|
|
|
services:
|
|
mcp-pdf-tools:
|
|
build: .
|
|
image: mcp-pdf-tools:latest
|
|
container_name: mcp-pdf-tools
|
|
volumes:
|
|
# Mount a directory for PDF files
|
|
- ./test_pdfs:/pdfs:ro
|
|
# Mount temp directory for processing
|
|
- ./tmp:/tmp/pdf_processing
|
|
environment:
|
|
- DEBUG=true
|
|
- TESSDATA_PREFIX=/usr/share/tesseract-ocr/5/tessdata
|
|
- PDF_TEMP_DIR=/tmp/pdf_processing
|
|
stdin_open: true
|
|
tty: true
|
|
# For testing, you can override the entrypoint
|
|
# entrypoint: /bin/bash
|