Compare commits
3 Commits
478ab41b1f
...
f601d44d99
Author | SHA1 | Date | |
---|---|---|---|
f601d44d99 | |||
f0365a0d75 | |||
58d43851b9 |
11
.mcp.json
Normal file
11
.mcp.json
Normal file
@ -0,0 +1,11 @@
|
|||||||
|
{
|
||||||
|
"mcpServers": {
|
||||||
|
"pdf-tools": {
|
||||||
|
"command": "uv",
|
||||||
|
"args": ["run", "mcp-pdf-tools"],
|
||||||
|
"env": {
|
||||||
|
"PDF_TEMP_DIR": "/tmp/mcp-pdf-processing"
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
88
CLAUDE_DESKTOP_SETUP.md
Normal file
88
CLAUDE_DESKTOP_SETUP.md
Normal file
@ -0,0 +1,88 @@
|
|||||||
|
# Claude Desktop MCP Configuration
|
||||||
|
|
||||||
|
This document explains how the MCP PDF Tools server has been configured for Claude Desktop.
|
||||||
|
|
||||||
|
## Configuration Location
|
||||||
|
|
||||||
|
The MCP configuration has been added to:
|
||||||
|
```
|
||||||
|
/home/rpm/.config/Claude/claude_desktop_config.json
|
||||||
|
```
|
||||||
|
|
||||||
|
## PDF Tools Server Configuration
|
||||||
|
|
||||||
|
The following configuration has been added to your Claude Desktop:
|
||||||
|
|
||||||
|
```json
|
||||||
|
{
|
||||||
|
"mcpServers": {
|
||||||
|
"pdf-tools": {
|
||||||
|
"command": "uv",
|
||||||
|
"args": [
|
||||||
|
"--directory",
|
||||||
|
"/home/rpm/claude/mcp-pdf-tools",
|
||||||
|
"run",
|
||||||
|
"mcp-pdf-tools"
|
||||||
|
],
|
||||||
|
"env": {
|
||||||
|
"PDF_TEMP_DIR": "/tmp/mcp-pdf-processing"
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
## What This Enables
|
||||||
|
|
||||||
|
With this configuration, all your Claude sessions will have access to:
|
||||||
|
|
||||||
|
- **extract_text**: Extract text from PDFs with multiple method support
|
||||||
|
- **extract_tables**: Extract tables from PDFs with intelligent fallbacks
|
||||||
|
- **extract_images**: Extract and filter images from PDFs
|
||||||
|
- **extract_metadata**: Get comprehensive PDF metadata and file information
|
||||||
|
- **get_document_structure**: Analyze PDF structure, outline, and fonts
|
||||||
|
- **is_scanned_pdf**: Detect if PDFs are scanned/image-based
|
||||||
|
- **ocr_pdf**: Perform OCR on scanned PDFs with preprocessing
|
||||||
|
- **pdf_to_markdown**: Convert PDFs to clean markdown format
|
||||||
|
|
||||||
|
## Environment Variables
|
||||||
|
|
||||||
|
- `PDF_TEMP_DIR`: Set to `/tmp/mcp-pdf-processing` for temporary file processing
|
||||||
|
|
||||||
|
## Backup
|
||||||
|
|
||||||
|
A backup of your original configuration has been saved to:
|
||||||
|
```
|
||||||
|
/home/rpm/.config/Claude/claude_desktop_config.json.backup
|
||||||
|
```
|
||||||
|
|
||||||
|
## Testing
|
||||||
|
|
||||||
|
The server has been tested and is working correctly. You can verify it's available in new Claude sessions by checking for the `mcp__pdf-tools__*` functions.
|
||||||
|
|
||||||
|
## Troubleshooting
|
||||||
|
|
||||||
|
If you encounter issues:
|
||||||
|
|
||||||
|
1. **Server not starting**: Check that all dependencies are installed:
|
||||||
|
```bash
|
||||||
|
cd /home/rpm/claude/mcp-pdf-tools
|
||||||
|
uv sync --dev
|
||||||
|
```
|
||||||
|
|
||||||
|
2. **System dependencies missing**: Install required packages:
|
||||||
|
```bash
|
||||||
|
sudo apt-get install tesseract-ocr tesseract-ocr-eng poppler-utils ghostscript python3-tk default-jre-headless
|
||||||
|
```
|
||||||
|
|
||||||
|
3. **Permission issues**: Ensure temp directory exists:
|
||||||
|
```bash
|
||||||
|
mkdir -p /tmp/mcp-pdf-processing
|
||||||
|
chmod 755 /tmp/mcp-pdf-processing
|
||||||
|
```
|
||||||
|
|
||||||
|
4. **Test server manually**:
|
||||||
|
```bash
|
||||||
|
cd /home/rpm/claude/mcp-pdf-tools
|
||||||
|
uv run mcp-pdf-tools --help
|
||||||
|
```
|
164
README.md
164
README.md
@ -10,7 +10,31 @@ A comprehensive FastMCP server for PDF processing operations. This server provid
|
|||||||
- **Document Analysis**: Extract structure, metadata, and check if PDFs are scanned
|
- **Document Analysis**: Extract structure, metadata, and check if PDFs are scanned
|
||||||
- **Image Extraction**: Extract images with size filtering
|
- **Image Extraction**: Extract images with size filtering
|
||||||
- **Format Conversion**: Convert PDFs to clean Markdown format
|
- **Format Conversion**: Convert PDFs to clean Markdown format
|
||||||
|
- **URL Support**: Process PDFs directly from HTTPS URLs with intelligent caching
|
||||||
- **Smart Detection**: Automatically detect the best method for each operation
|
- **Smart Detection**: Automatically detect the best method for each operation
|
||||||
|
- **User-Friendly**: All page numbers use 1-based indexing (page 1 = first page)
|
||||||
|
|
||||||
|
## URL Support
|
||||||
|
|
||||||
|
All tools support processing PDFs directly from HTTPS URLs:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Extract text from URL
|
||||||
|
mcp_pdf_tools extract_text "https://example.com/document.pdf"
|
||||||
|
|
||||||
|
# Extract tables from URL
|
||||||
|
mcp_pdf_tools extract_tables "https://example.com/report.pdf"
|
||||||
|
|
||||||
|
# Convert URL PDF to markdown
|
||||||
|
mcp_pdf_tools pdf_to_markdown "https://example.com/paper.pdf"
|
||||||
|
```
|
||||||
|
|
||||||
|
**Features:**
|
||||||
|
- **Intelligent caching**: Downloaded PDFs are cached for 1 hour to avoid repeated downloads
|
||||||
|
- **Content validation**: Verifies content is actually a PDF file (checks magic bytes and content-type)
|
||||||
|
- **Security**: HTTPS URLs recommended (HTTP URLs show security warnings)
|
||||||
|
- **Proper headers**: Sends appropriate User-Agent for better server compatibility
|
||||||
|
- **Error handling**: Clear error messages for network issues or invalid content
|
||||||
|
|
||||||
## Installation
|
## Installation
|
||||||
|
|
||||||
@ -110,7 +134,7 @@ result = await extract_text(
|
|||||||
# Extract specific pages with layout preservation
|
# Extract specific pages with layout preservation
|
||||||
result = await extract_text(
|
result = await extract_text(
|
||||||
pdf_path="/path/to/document.pdf",
|
pdf_path="/path/to/document.pdf",
|
||||||
pages=[0, 1, 2], # First 3 pages
|
pages=[1, 2, 3], # First 3 pages (1-based numbering)
|
||||||
preserve_layout=True,
|
preserve_layout=True,
|
||||||
method="pdfplumber" # Or "auto", "pymupdf", "pypdf"
|
method="pdfplumber" # Or "auto", "pymupdf", "pypdf"
|
||||||
)
|
)
|
||||||
@ -127,7 +151,7 @@ result = await extract_tables(
|
|||||||
# Extract tables from specific pages in markdown format
|
# Extract tables from specific pages in markdown format
|
||||||
result = await extract_tables(
|
result = await extract_tables(
|
||||||
pdf_path="/path/to/document.pdf",
|
pdf_path="/path/to/document.pdf",
|
||||||
pages=[2, 3],
|
pages=[2, 3], # Pages 2 and 3 (1-based numbering)
|
||||||
output_format="markdown" # Or "json", "csv"
|
output_format="markdown" # Or "json", "csv"
|
||||||
)
|
)
|
||||||
```
|
```
|
||||||
@ -191,18 +215,150 @@ result = await extract_images(
|
|||||||
)
|
)
|
||||||
```
|
```
|
||||||
|
|
||||||
|
### Advanced Analysis
|
||||||
|
|
||||||
|
```python
|
||||||
|
# Analyze document health and quality
|
||||||
|
result = await analyze_pdf_health(
|
||||||
|
pdf_path="/path/to/document.pdf"
|
||||||
|
)
|
||||||
|
|
||||||
|
# Classify content type and structure
|
||||||
|
result = await classify_content(
|
||||||
|
pdf_path="/path/to/document.pdf"
|
||||||
|
)
|
||||||
|
|
||||||
|
# Generate content summary
|
||||||
|
result = await summarize_content(
|
||||||
|
pdf_path="/path/to/document.pdf",
|
||||||
|
summary_length="medium", # "short", "medium", "long"
|
||||||
|
pages="1,2,3" # Specific pages (1-based numbering)
|
||||||
|
)
|
||||||
|
|
||||||
|
# Analyze page layout
|
||||||
|
result = await analyze_layout(
|
||||||
|
pdf_path="/path/to/document.pdf",
|
||||||
|
pages="1,2,3", # Specific pages (1-based numbering)
|
||||||
|
include_coordinates=True
|
||||||
|
)
|
||||||
|
```
|
||||||
|
|
||||||
|
### Content Manipulation
|
||||||
|
|
||||||
|
```python
|
||||||
|
# Extract form data
|
||||||
|
result = await extract_form_data(
|
||||||
|
pdf_path="/path/to/form.pdf"
|
||||||
|
)
|
||||||
|
|
||||||
|
# Split PDF into separate files
|
||||||
|
result = await split_pdf(
|
||||||
|
pdf_path="/path/to/document.pdf",
|
||||||
|
split_pages="5,10,15", # Split after pages 5, 10, 15 (1-based)
|
||||||
|
output_prefix="section"
|
||||||
|
)
|
||||||
|
|
||||||
|
# Merge multiple PDFs
|
||||||
|
result = await merge_pdfs(
|
||||||
|
pdf_paths=["/path/to/doc1.pdf", "/path/to/doc2.pdf"],
|
||||||
|
output_filename="merged_document.pdf"
|
||||||
|
)
|
||||||
|
|
||||||
|
# Rotate specific pages
|
||||||
|
result = await rotate_pages(
|
||||||
|
pdf_path="/path/to/document.pdf",
|
||||||
|
page_rotations={"1": 90, "3": 180} # Page 1: 90°, Page 3: 180° (1-based)
|
||||||
|
)
|
||||||
|
```
|
||||||
|
|
||||||
|
### Optimization and Repair
|
||||||
|
|
||||||
|
```python
|
||||||
|
# Optimize PDF file size
|
||||||
|
result = await optimize_pdf(
|
||||||
|
pdf_path="/path/to/large.pdf",
|
||||||
|
optimization_level="balanced", # "light", "balanced", "aggressive"
|
||||||
|
preserve_quality=True
|
||||||
|
)
|
||||||
|
|
||||||
|
# Repair corrupted PDF
|
||||||
|
result = await repair_pdf(
|
||||||
|
pdf_path="/path/to/corrupted.pdf"
|
||||||
|
)
|
||||||
|
|
||||||
|
# Compare two PDFs
|
||||||
|
result = await compare_pdfs(
|
||||||
|
pdf_path1="/path/to/original.pdf",
|
||||||
|
pdf_path2="/path/to/modified.pdf",
|
||||||
|
comparison_type="all" # "text", "structure", "metadata", "all"
|
||||||
|
)
|
||||||
|
```
|
||||||
|
|
||||||
|
### Visual Analysis
|
||||||
|
|
||||||
|
```python
|
||||||
|
# Extract charts and diagrams
|
||||||
|
result = await extract_charts(
|
||||||
|
pdf_path="/path/to/report.pdf",
|
||||||
|
pages="2,3,4", # Pages 2, 3, 4 (1-based numbering)
|
||||||
|
min_size=150 # Minimum size for chart detection
|
||||||
|
)
|
||||||
|
|
||||||
|
# Detect watermarks
|
||||||
|
result = await detect_watermarks(
|
||||||
|
pdf_path="/path/to/document.pdf"
|
||||||
|
)
|
||||||
|
|
||||||
|
# Security analysis
|
||||||
|
result = await analyze_pdf_security(
|
||||||
|
pdf_path="/path/to/document.pdf"
|
||||||
|
)
|
||||||
|
```
|
||||||
|
|
||||||
## Available Tools
|
## Available Tools
|
||||||
|
|
||||||
|
### Core Processing Tools
|
||||||
| Tool | Description |
|
| Tool | Description |
|
||||||
|------|-------------|
|
|------|-------------|
|
||||||
| `extract_text` | Extract text with multiple methods and layout preservation |
|
| `extract_text` | Extract text with multiple methods and layout preservation |
|
||||||
| `extract_tables` | Extract tables in various formats (JSON, CSV, Markdown) |
|
| `extract_tables` | Extract tables in various formats (JSON, CSV, Markdown) |
|
||||||
| `ocr_pdf` | Perform OCR on scanned PDFs with preprocessing |
|
| `ocr_pdf` | Perform OCR on scanned PDFs with preprocessing |
|
||||||
|
| `extract_images` | Extract images with filtering options |
|
||||||
|
| `pdf_to_markdown` | Convert PDF to clean Markdown format |
|
||||||
|
|
||||||
|
### Document Analysis Tools
|
||||||
|
| Tool | Description |
|
||||||
|
|------|-------------|
|
||||||
| `is_scanned_pdf` | Check if a PDF is scanned or text-based |
|
| `is_scanned_pdf` | Check if a PDF is scanned or text-based |
|
||||||
| `get_document_structure` | Extract document structure, outline, and basic metadata |
|
| `get_document_structure` | Extract document structure, outline, and basic metadata |
|
||||||
| `extract_metadata` | Extract comprehensive metadata and file statistics |
|
| `extract_metadata` | Extract comprehensive metadata and file statistics |
|
||||||
| `pdf_to_markdown` | Convert PDF to clean Markdown format |
|
| `analyze_pdf_health` | Comprehensive PDF health and quality analysis |
|
||||||
| `extract_images` | Extract images with filtering options |
|
| `analyze_pdf_security` | Analyze PDF security features and potential issues |
|
||||||
|
| `classify_content` | Classify and analyze PDF content type and structure |
|
||||||
|
| `summarize_content` | Generate summary and key insights from PDF content |
|
||||||
|
|
||||||
|
### Layout and Visual Analysis Tools
|
||||||
|
| Tool | Description |
|
||||||
|
|------|-------------|
|
||||||
|
| `analyze_layout` | Analyze PDF page layout including text blocks, columns, and spacing |
|
||||||
|
| `extract_charts` | Extract and analyze charts, diagrams, and visual elements |
|
||||||
|
| `detect_watermarks` | Detect and analyze watermarks in PDF |
|
||||||
|
|
||||||
|
### Content Manipulation Tools
|
||||||
|
| Tool | Description |
|
||||||
|
|------|-------------|
|
||||||
|
| `extract_form_data` | Extract form fields and their values from PDF forms |
|
||||||
|
| `split_pdf` | Split PDF into multiple files at specified pages |
|
||||||
|
| `merge_pdfs` | Merge multiple PDFs into a single file |
|
||||||
|
| `rotate_pages` | Rotate specific pages by 90, 180, or 270 degrees |
|
||||||
|
|
||||||
|
### Utility and Optimization Tools
|
||||||
|
| Tool | Description |
|
||||||
|
|------|-------------|
|
||||||
|
| `compare_pdfs` | Compare two PDFs for differences in text, structure, and metadata |
|
||||||
|
| `convert_to_images` | Convert PDF pages to image files |
|
||||||
|
| `optimize_pdf` | Optimize PDF file size and performance |
|
||||||
|
| `repair_pdf` | Attempt to repair corrupted or damaged PDF files |
|
||||||
|
|
||||||
## Development
|
## Development
|
||||||
|
|
||||||
|
16
claude_desktop_config.json
Normal file
16
claude_desktop_config.json
Normal file
@ -0,0 +1,16 @@
|
|||||||
|
{
|
||||||
|
"mcpServers": {
|
||||||
|
"pdf-tools": {
|
||||||
|
"command": "uv",
|
||||||
|
"args": [
|
||||||
|
"--directory",
|
||||||
|
"/home/rpm/claude/mcp-pdf-tools",
|
||||||
|
"run",
|
||||||
|
"mcp-pdf-tools"
|
||||||
|
],
|
||||||
|
"env": {
|
||||||
|
"PDF_TEMP_DIR": "/tmp/mcp-pdf-processing"
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
104
examples/url_examples.py
Normal file
104
examples/url_examples.py
Normal file
@ -0,0 +1,104 @@
|
|||||||
|
#!/usr/bin/env python3
|
||||||
|
"""
|
||||||
|
Examples of using MCP PDF Tools with URLs
|
||||||
|
"""
|
||||||
|
|
||||||
|
import asyncio
|
||||||
|
import sys
|
||||||
|
import os
|
||||||
|
|
||||||
|
# Add src to path for development
|
||||||
|
sys.path.insert(0, '../src')
|
||||||
|
|
||||||
|
from mcp_pdf_tools.server import (
|
||||||
|
extract_text, extract_metadata, pdf_to_markdown,
|
||||||
|
extract_tables, is_scanned_pdf
|
||||||
|
)
|
||||||
|
|
||||||
|
async def example_text_extraction():
|
||||||
|
"""Example: Extract text from a PDF URL"""
|
||||||
|
print("🔗 Extracting text from URL...")
|
||||||
|
|
||||||
|
# Using a sample PDF from the web
|
||||||
|
url = "https://www.w3.org/WAI/ER/tests/xhtml/testfiles/resources/pdf/dummy.pdf"
|
||||||
|
|
||||||
|
try:
|
||||||
|
result = await extract_text(url)
|
||||||
|
print(f"✅ Text extraction successful!")
|
||||||
|
print(f" Method used: {result['method_used']}")
|
||||||
|
print(f" Pages: {result['metadata']['pages']}")
|
||||||
|
print(f" Extracted text length: {len(result['text'])} characters")
|
||||||
|
print(f" First 100 characters: {result['text'][:100]}...")
|
||||||
|
|
||||||
|
except Exception as e:
|
||||||
|
print(f"❌ Failed: {e}")
|
||||||
|
|
||||||
|
async def example_metadata_extraction():
|
||||||
|
"""Example: Extract metadata from a PDF URL"""
|
||||||
|
print("\n📋 Extracting metadata from URL...")
|
||||||
|
|
||||||
|
url = "https://www.w3.org/WAI/ER/tests/xhtml/testfiles/resources/pdf/dummy.pdf"
|
||||||
|
|
||||||
|
try:
|
||||||
|
result = await extract_metadata(url)
|
||||||
|
print(f"✅ Metadata extraction successful!")
|
||||||
|
print(f" File size: {result['file_info']['size_mb']:.2f} MB")
|
||||||
|
print(f" Pages: {result['statistics']['page_count']}")
|
||||||
|
print(f" Title: {result['metadata'].get('title', 'No title')}")
|
||||||
|
print(f" Creation date: {result['metadata'].get('creation_date', 'Unknown')}")
|
||||||
|
|
||||||
|
except Exception as e:
|
||||||
|
print(f"❌ Failed: {e}")
|
||||||
|
|
||||||
|
async def example_scanned_detection():
|
||||||
|
"""Example: Check if PDF is scanned"""
|
||||||
|
print("\n🔍 Checking if PDF is scanned...")
|
||||||
|
|
||||||
|
url = "https://www.w3.org/WAI/ER/tests/xhtml/testfiles/resources/pdf/dummy.pdf"
|
||||||
|
|
||||||
|
try:
|
||||||
|
result = await is_scanned_pdf(url)
|
||||||
|
print(f"✅ Scanned detection successful!")
|
||||||
|
print(f" Is scanned: {result['is_scanned']}")
|
||||||
|
print(f" Recommendation: {result['recommendation']}")
|
||||||
|
print(f" Pages checked: {result['sample_pages_checked']}")
|
||||||
|
|
||||||
|
except Exception as e:
|
||||||
|
print(f"❌ Failed: {e}")
|
||||||
|
|
||||||
|
async def example_markdown_conversion():
|
||||||
|
"""Example: Convert PDF URL to markdown"""
|
||||||
|
print("\n📝 Converting PDF to markdown...")
|
||||||
|
|
||||||
|
url = "https://www.w3.org/WAI/ER/tests/xhtml/testfiles/resources/pdf/dummy.pdf"
|
||||||
|
|
||||||
|
try:
|
||||||
|
result = await pdf_to_markdown(url)
|
||||||
|
print(f"✅ Markdown conversion successful!")
|
||||||
|
print(f" Pages converted: {result['pages_converted']}")
|
||||||
|
print(f" Markdown length: {len(result['markdown'])} characters")
|
||||||
|
print(f" First 200 characters:")
|
||||||
|
print(f" {result['markdown'][:200]}...")
|
||||||
|
|
||||||
|
except Exception as e:
|
||||||
|
print(f"❌ Failed: {e}")
|
||||||
|
|
||||||
|
async def main():
|
||||||
|
"""Run all URL examples"""
|
||||||
|
print("🌐 MCP PDF Tools - URL Examples")
|
||||||
|
print("=" * 50)
|
||||||
|
|
||||||
|
await example_text_extraction()
|
||||||
|
await example_metadata_extraction()
|
||||||
|
await example_scanned_detection()
|
||||||
|
await example_markdown_conversion()
|
||||||
|
|
||||||
|
print("\n✨ URL examples completed!")
|
||||||
|
print("\n💡 Tips:")
|
||||||
|
print(" • URLs are cached for 1 hour to avoid repeated downloads")
|
||||||
|
print(" • Use HTTPS URLs for security")
|
||||||
|
print(" • The server validates content is actually a PDF file")
|
||||||
|
print(" • All tools support the same URL format")
|
||||||
|
|
||||||
|
if __name__ == "__main__":
|
||||||
|
asyncio.run(main())
|
3
mcp-pdf-tools-launcher.sh
Executable file
3
mcp-pdf-tools-launcher.sh
Executable file
@ -0,0 +1,3 @@
|
|||||||
|
#!/bin/bash
|
||||||
|
cd /home/rpm/claude/mcp-pdf-tools
|
||||||
|
exec uv run mcp-pdf-tools "$@"
|
File diff suppressed because it is too large
Load Diff
52
test_pages_parameter.py
Normal file
52
test_pages_parameter.py
Normal file
@ -0,0 +1,52 @@
|
|||||||
|
#!/usr/bin/env python3
|
||||||
|
"""
|
||||||
|
Test the updated pages parameter parsing
|
||||||
|
"""
|
||||||
|
|
||||||
|
import asyncio
|
||||||
|
import sys
|
||||||
|
import os
|
||||||
|
|
||||||
|
# Add src to path
|
||||||
|
sys.path.insert(0, 'src')
|
||||||
|
|
||||||
|
from mcp_pdf_tools.server import parse_pages_parameter
|
||||||
|
|
||||||
|
def test_page_parsing():
|
||||||
|
"""Test page parameter parsing (1-based user input -> 0-based internal)"""
|
||||||
|
print("Testing page parameter parsing (1-based user input -> 0-based internal)...")
|
||||||
|
|
||||||
|
# Test different input formats - all converted from 1-based user input to 0-based internal
|
||||||
|
test_cases = [
|
||||||
|
(None, None),
|
||||||
|
("1,2,3", [0, 1, 2]), # 1-based input -> 0-based internal
|
||||||
|
("[2, 3]", [1, 2]), # This is the problematic case from the user
|
||||||
|
("5", [4]), # Page 5 becomes index 4
|
||||||
|
([1, 2, 3], [0, 1, 2]), # List input also converted
|
||||||
|
("2,3,4", [1, 2, 3]), # Pages 2,3,4 -> indexes 1,2,3
|
||||||
|
("[1,2,3]", [0, 1, 2]) # Another format
|
||||||
|
]
|
||||||
|
|
||||||
|
all_passed = True
|
||||||
|
|
||||||
|
for input_val, expected in test_cases:
|
||||||
|
try:
|
||||||
|
result = parse_pages_parameter(input_val)
|
||||||
|
if result == expected:
|
||||||
|
print(f"✅ '{input_val}' -> {result}")
|
||||||
|
else:
|
||||||
|
print(f"❌ '{input_val}' -> {result}, expected {expected}")
|
||||||
|
all_passed = False
|
||||||
|
except Exception as e:
|
||||||
|
print(f"❌ '{input_val}' -> Error: {e}")
|
||||||
|
all_passed = False
|
||||||
|
|
||||||
|
return all_passed
|
||||||
|
|
||||||
|
if __name__ == "__main__":
|
||||||
|
success = test_page_parsing()
|
||||||
|
if success:
|
||||||
|
print("\n🎉 All page parameter parsing tests passed!")
|
||||||
|
else:
|
||||||
|
print("\n🚨 Some tests failed!")
|
||||||
|
sys.exit(0 if success else 1)
|
71
test_url_support.py
Normal file
71
test_url_support.py
Normal file
@ -0,0 +1,71 @@
|
|||||||
|
#!/usr/bin/env python3
|
||||||
|
"""
|
||||||
|
Test URL support for MCP PDF Tools
|
||||||
|
"""
|
||||||
|
|
||||||
|
import asyncio
|
||||||
|
import sys
|
||||||
|
import os
|
||||||
|
|
||||||
|
# Add src to path
|
||||||
|
sys.path.insert(0, 'src')
|
||||||
|
|
||||||
|
from mcp_pdf_tools.server import validate_pdf_path, download_pdf_from_url
|
||||||
|
|
||||||
|
async def test_url_validation():
|
||||||
|
"""Test URL validation and download"""
|
||||||
|
print("Testing URL validation and download...")
|
||||||
|
|
||||||
|
# Test with a known PDF URL (using a publicly available sample)
|
||||||
|
test_url = "https://www.w3.org/WAI/ER/tests/xhtml/testfiles/resources/pdf/dummy.pdf"
|
||||||
|
|
||||||
|
try:
|
||||||
|
print(f"Testing URL: {test_url}")
|
||||||
|
path = await validate_pdf_path(test_url)
|
||||||
|
print(f"✅ Successfully downloaded and validated PDF: {path}")
|
||||||
|
print(f" File size: {path.stat().st_size} bytes")
|
||||||
|
return True
|
||||||
|
|
||||||
|
except Exception as e:
|
||||||
|
print(f"❌ URL test failed: {e}")
|
||||||
|
return False
|
||||||
|
|
||||||
|
async def test_local_path():
|
||||||
|
"""Test that local paths still work"""
|
||||||
|
print("\nTesting local path validation...")
|
||||||
|
|
||||||
|
# Test with our existing test PDF
|
||||||
|
test_path = "/tmp/test_text.pdf"
|
||||||
|
|
||||||
|
if not os.path.exists(test_path):
|
||||||
|
print(f"⚠️ Test file {test_path} not found, skipping local test")
|
||||||
|
return True
|
||||||
|
|
||||||
|
try:
|
||||||
|
path = await validate_pdf_path(test_path)
|
||||||
|
print(f"✅ Local path validation works: {path}")
|
||||||
|
return True
|
||||||
|
|
||||||
|
except Exception as e:
|
||||||
|
print(f"❌ Local path test failed: {e}")
|
||||||
|
return False
|
||||||
|
|
||||||
|
async def main():
|
||||||
|
print("🧪 Testing MCP PDF Tools URL Support\n")
|
||||||
|
|
||||||
|
url_success = await test_url_validation()
|
||||||
|
local_success = await test_local_path()
|
||||||
|
|
||||||
|
print(f"\n📊 Test Results:")
|
||||||
|
print(f" URL support: {'✅ PASS' if url_success else '❌ FAIL'}")
|
||||||
|
print(f" Local paths: {'✅ PASS' if local_success else '❌ FAIL'}")
|
||||||
|
|
||||||
|
if url_success and local_success:
|
||||||
|
print("\n🎉 All tests passed! URL support is working.")
|
||||||
|
return 0
|
||||||
|
else:
|
||||||
|
print("\n🚨 Some tests failed.")
|
||||||
|
return 1
|
||||||
|
|
||||||
|
if __name__ == "__main__":
|
||||||
|
sys.exit(asyncio.run(main()))
|
Loading…
x
Reference in New Issue
Block a user