9 changed files with 43 additions and 2715 deletions
--- a/.mcp.json
+++ b/.mcp.json
@ -1,11 +0,0 @@
-{
-  "mcpServers": {
-    "pdf-tools": {
-      "command": "uv",
-      "args": ["run", "mcp-pdf-tools"],
-      "env": {
-        "PDF_TEMP_DIR": "/tmp/mcp-pdf-processing"
-      }
-    }
-  }
-}
--- a/CLAUDE_DESKTOP_SETUP.md
+++ b/CLAUDE_DESKTOP_SETUP.md
@ -1,88 +0,0 @@
-# Claude Desktop MCP Configuration
-
-This document explains how the MCP PDF Tools server has been configured for Claude Desktop.
-
-## Configuration Location
-
-The MCP configuration has been added to:
-```
-/home/rpm/.config/Claude/claude_desktop_config.json
-```
-
-## PDF Tools Server Configuration
-
-The following configuration has been added to your Claude Desktop:
-
-```json
-{
-  "mcpServers": {
-    "pdf-tools": {
-      "command": "uv",
-      "args": [
-        "--directory",
-        "/home/rpm/claude/mcp-pdf-tools",
-        "run",
-        "mcp-pdf-tools"
-      ],
-      "env": {
-        "PDF_TEMP_DIR": "/tmp/mcp-pdf-processing"
-      }
-    }
-  }
-}
-```
-
-## What This Enables
-
-With this configuration, all your Claude sessions will have access to:
-
- **extract_text**: Extract text from PDFs with multiple method support
- **extract_tables**: Extract tables from PDFs with intelligent fallbacks
- **extract_images**: Extract and filter images from PDFs
- **extract_metadata**: Get comprehensive PDF metadata and file information
- **get_document_structure**: Analyze PDF structure, outline, and fonts
- **is_scanned_pdf**: Detect if PDFs are scanned/image-based
- **ocr_pdf**: Perform OCR on scanned PDFs with preprocessing
- **pdf_to_markdown**: Convert PDFs to clean markdown format
-
-## Environment Variables
-
- `PDF_TEMP_DIR`: Set to `/tmp/mcp-pdf-processing` for temporary file processing
-
-## Backup
-
-A backup of your original configuration has been saved to:
-```
-/home/rpm/.config/Claude/claude_desktop_config.json.backup
-```
-
-## Testing
-
-The server has been tested and is working correctly. You can verify it's available in new Claude sessions by checking for the `mcp__pdf-tools__*` functions.
-
-## Troubleshooting
-
-If you encounter issues:
-
-1. **Server not starting**: Check that all dependencies are installed:
-   ```bash
-   cd /home/rpm/claude/mcp-pdf-tools
-   uv sync --dev
-   ```
-
-2. **System dependencies missing**: Install required packages:
-   ```bash
-   sudo apt-get install tesseract-ocr tesseract-ocr-eng poppler-utils ghostscript python3-tk default-jre-headless
-   ```
-
-3. **Permission issues**: Ensure temp directory exists:
-   ```bash
-   mkdir -p /tmp/mcp-pdf-processing
-   chmod 755 /tmp/mcp-pdf-processing
-   ```
-
-4. **Test server manually**:
-   ```bash
-   cd /home/rpm/claude/mcp-pdf-tools
-   uv run mcp-pdf-tools --help
-   ```
--- a/README.md
+++ b/README.md
@ -10,31 +10,7 @@ A comprehensive FastMCP server for PDF processing operations. This server provid
 - **Document Analysis**: Extract structure, metadata, and check if PDFs are scanned
 - **Image Extraction**: Extract images with size filtering
 - **Format Conversion**: Convert PDFs to clean Markdown format
- **URL Support**: Process PDFs directly from HTTPS URLs with intelligent caching
 - **Smart Detection**: Automatically detect the best method for each operation
- **User-Friendly**: All page numbers use 1-based indexing (page 1 = first page)
-
-## URL Support
-
-All tools support processing PDFs directly from HTTPS URLs:
-
-```bash
-# Extract text from URL
-mcp_pdf_tools extract_text "https://example.com/document.pdf"
-
-# Extract tables from URL  
-mcp_pdf_tools extract_tables "https://example.com/report.pdf"
-
-# Convert URL PDF to markdown
-mcp_pdf_tools pdf_to_markdown "https://example.com/paper.pdf"
-```
-
-**Features:**
- **Intelligent caching**: Downloaded PDFs are cached for 1 hour to avoid repeated downloads
- **Content validation**: Verifies content is actually a PDF file (checks magic bytes and content-type)
- **Security**: HTTPS URLs recommended (HTTP URLs show security warnings)
- **Proper headers**: Sends appropriate User-Agent for better server compatibility
- **Error handling**: Clear error messages for network issues or invalid content

 ## Installation

@ -134,7 +110,7 @@ result = await extract_text(
 # Extract specific pages with layout preservation
 result = await extract_text(
    pdf_path="/path/to/document.pdf",
-    pages=[1, 2, 3],  # First 3 pages (1-based numbering)
+    pages=[0, 1, 2],  # First 3 pages
    preserve_layout=True,
    method="pdfplumber"  # Or "auto", "pymupdf", "pypdf"
 )
@ -151,7 +127,7 @@ result = await extract_tables(
 # Extract tables from specific pages in markdown format
 result = await extract_tables(
    pdf_path="/path/to/document.pdf",
-    pages=[2, 3],  # Pages 2 and 3 (1-based numbering)
+    pages=[2, 3],
    output_format="markdown"  # Or "json", "csv"
 )
 ```
@ -215,150 +191,18 @@ result = await extract_images(
 )
 ```

-### Advanced Analysis
-
-```python
-# Analyze document health and quality
-result = await analyze_pdf_health(
-    pdf_path="/path/to/document.pdf"
-)
-
-# Classify content type and structure
-result = await classify_content(
-    pdf_path="/path/to/document.pdf"
-)
-
-# Generate content summary
-result = await summarize_content(
-    pdf_path="/path/to/document.pdf",
-    summary_length="medium",  # "short", "medium", "long"
-    pages="1,2,3"  # Specific pages (1-based numbering)
-)
-
-# Analyze page layout
-result = await analyze_layout(
-    pdf_path="/path/to/document.pdf",
-    pages="1,2,3",  # Specific pages (1-based numbering)
-    include_coordinates=True
-)
-```
-
-### Content Manipulation
-
-```python
-# Extract form data
-result = await extract_form_data(
-    pdf_path="/path/to/form.pdf"
-)
-
-# Split PDF into separate files  
-result = await split_pdf(
-    pdf_path="/path/to/document.pdf",
-    split_pages="5,10,15",  # Split after pages 5, 10, 15 (1-based)
-    output_prefix="section"
-)
-
-# Merge multiple PDFs
-result = await merge_pdfs(
-    pdf_paths=["/path/to/doc1.pdf", "/path/to/doc2.pdf"],
-    output_filename="merged_document.pdf"
-)
-
-# Rotate specific pages
-result = await rotate_pages(
-    pdf_path="/path/to/document.pdf",
-    page_rotations={"1": 90, "3": 180}  # Page 1: 90°, Page 3: 180° (1-based)
-)
-```
-
-### Optimization and Repair
-
-```python
-# Optimize PDF file size
-result = await optimize_pdf(
-    pdf_path="/path/to/large.pdf",
-    optimization_level="balanced",  # "light", "balanced", "aggressive"
-    preserve_quality=True
-)
-
-# Repair corrupted PDF
-result = await repair_pdf(
-    pdf_path="/path/to/corrupted.pdf"
-)
-
-# Compare two PDFs
-result = await compare_pdfs(
-    pdf_path1="/path/to/original.pdf",
-    pdf_path2="/path/to/modified.pdf",
-    comparison_type="all"  # "text", "structure", "metadata", "all"
-)
-```
-
-### Visual Analysis
-
-```python
-# Extract charts and diagrams
-result = await extract_charts(
-    pdf_path="/path/to/report.pdf",
-    pages="2,3,4",  # Pages 2, 3, 4 (1-based numbering)
-    min_size=150  # Minimum size for chart detection
-)
-
-# Detect watermarks
-result = await detect_watermarks(
-    pdf_path="/path/to/document.pdf"
-)
-
-# Security analysis
-result = await analyze_pdf_security(
-    pdf_path="/path/to/document.pdf"
-)
-```
-
 ## Available Tools

-### Core Processing Tools
 | Tool | Description |
 |------|-------------|
 | `extract_text` | Extract text with multiple methods and layout preservation |
 | `extract_tables` | Extract tables in various formats (JSON, CSV, Markdown) |
 | `ocr_pdf` | Perform OCR on scanned PDFs with preprocessing |
-| `extract_images` | Extract images with filtering options |
-| `pdf_to_markdown` | Convert PDF to clean Markdown format |
-
-### Document Analysis Tools  
-| Tool | Description |
-|------|-------------|
 | `is_scanned_pdf` | Check if a PDF is scanned or text-based |
 | `get_document_structure` | Extract document structure, outline, and basic metadata |
 | `extract_metadata` | Extract comprehensive metadata and file statistics |
-| `analyze_pdf_health` | Comprehensive PDF health and quality analysis |
-| `analyze_pdf_security` | Analyze PDF security features and potential issues |
-| `classify_content` | Classify and analyze PDF content type and structure |
-| `summarize_content` | Generate summary and key insights from PDF content |
-
-### Layout and Visual Analysis Tools
-| Tool | Description |
-|------|-------------|
-| `analyze_layout` | Analyze PDF page layout including text blocks, columns, and spacing |
-| `extract_charts` | Extract and analyze charts, diagrams, and visual elements |
-| `detect_watermarks` | Detect and analyze watermarks in PDF |
-
-### Content Manipulation Tools
-| Tool | Description |
-|------|-------------|
-| `extract_form_data` | Extract form fields and their values from PDF forms |
-| `split_pdf` | Split PDF into multiple files at specified pages |
-| `merge_pdfs` | Merge multiple PDFs into a single file |
-| `rotate_pages` | Rotate specific pages by 90, 180, or 270 degrees |
-
-### Utility and Optimization Tools
-| Tool | Description |
-|------|-------------|
-| `compare_pdfs` | Compare two PDFs for differences in text, structure, and metadata |
-| `convert_to_images` | Convert PDF pages to image files |
-| `optimize_pdf` | Optimize PDF file size and performance |
-| `repair_pdf` | Attempt to repair corrupted or damaged PDF files |
+| `pdf_to_markdown` | Convert PDF to clean Markdown format |
+| `extract_images` | Extract images with filtering options |

 ## Development

--- a/claude_desktop_config.json
+++ b/claude_desktop_config.json
@ -1,16 +0,0 @@
-{
-  "mcpServers": {
-    "pdf-tools": {
-      "command": "uv",
-      "args": [
-        "--directory",
-        "/home/rpm/claude/mcp-pdf-tools",
-        "run",
-        "mcp-pdf-tools"
-      ],
-      "env": {
-        "PDF_TEMP_DIR": "/tmp/mcp-pdf-processing"
-      }
-    }
-  }
-}
--- a/examples/url_examples.py
+++ b/examples/url_examples.py
@ -1,104 +0,0 @@
-#!/usr/bin/env python3
-"""
-Examples of using MCP PDF Tools with URLs
-"""
-
-import asyncio
-import sys
-import os
-
-# Add src to path for development
-sys.path.insert(0, '../src')
-
-from mcp_pdf_tools.server import (
-    extract_text, extract_metadata, pdf_to_markdown, 
-    extract_tables, is_scanned_pdf
-)
-
-async def example_text_extraction():
-    """Example: Extract text from a PDF URL"""
-    print("🔗 Extracting text from URL...")
-    
-    # Using a sample PDF from the web
-    url = "https://www.w3.org/WAI/ER/tests/xhtml/testfiles/resources/pdf/dummy.pdf"
-    
-    try:
-        result = await extract_text(url)
-        print(f"✅ Text extraction successful!")
-        print(f"   Method used: {result['method_used']}")
-        print(f"   Pages: {result['metadata']['pages']}")
-        print(f"   Extracted text length: {len(result['text'])} characters")
-        print(f"   First 100 characters: {result['text'][:100]}...")
-        
-    except Exception as e:
-        print(f"❌ Failed: {e}")
-
-async def example_metadata_extraction():
-    """Example: Extract metadata from a PDF URL"""
-    print("\n📋 Extracting metadata from URL...")
-    
-    url = "https://www.w3.org/WAI/ER/tests/xhtml/testfiles/resources/pdf/dummy.pdf"
-    
-    try:
-        result = await extract_metadata(url)
-        print(f"✅ Metadata extraction successful!")
-        print(f"   File size: {result['file_info']['size_mb']:.2f} MB")
-        print(f"   Pages: {result['statistics']['page_count']}")
-        print(f"   Title: {result['metadata'].get('title', 'No title')}")
-        print(f"   Creation date: {result['metadata'].get('creation_date', 'Unknown')}")
-        
-    except Exception as e:
-        print(f"❌ Failed: {e}")
-
-async def example_scanned_detection():
-    """Example: Check if PDF is scanned"""
-    print("\n🔍 Checking if PDF is scanned...")
-    
-    url = "https://www.w3.org/WAI/ER/tests/xhtml/testfiles/resources/pdf/dummy.pdf"
-    
-    try:
-        result = await is_scanned_pdf(url)
-        print(f"✅ Scanned detection successful!")
-        print(f"   Is scanned: {result['is_scanned']}")
-        print(f"   Recommendation: {result['recommendation']}")
-        print(f"   Pages checked: {result['sample_pages_checked']}")
-        
-    except Exception as e:
-        print(f"❌ Failed: {e}")
-
-async def example_markdown_conversion():
-    """Example: Convert PDF URL to markdown"""
-    print("\n📝 Converting PDF to markdown...")
-    
-    url = "https://www.w3.org/WAI/ER/tests/xhtml/testfiles/resources/pdf/dummy.pdf"
-    
-    try:
-        result = await pdf_to_markdown(url)
-        print(f"✅ Markdown conversion successful!")
-        print(f"   Pages converted: {result['pages_converted']}")
-        print(f"   Markdown length: {len(result['markdown'])} characters")
-        print(f"   First 200 characters:")
-        print(f"   {result['markdown'][:200]}...")
-        
-    except Exception as e:
-        print(f"❌ Failed: {e}")
-
-async def main():
-    """Run all URL examples"""
-    print("🌐 MCP PDF Tools - URL Examples")
-    print("=" * 50)
-    
-    await example_text_extraction()
-    await example_metadata_extraction() 
-    await example_scanned_detection()
-    await example_markdown_conversion()
-    
-    print("\n✨ URL examples completed!")
-    print("\n💡 Tips:")
-    print("   • URLs are cached for 1 hour to avoid repeated downloads")
-    print("   • Use HTTPS URLs for security")
-    print("   • The server validates content is actually a PDF file")
-    print("   • All tools support the same URL format")
-
-if __name__ == "__main__":
-    asyncio.run(main())
--- a/mcp-pdf-tools-launcher.sh
+++ b/mcp-pdf-tools-launcher.sh
@ -1,3 +0,0 @@
-#!/bin/bash
-cd /home/rpm/claude/mcp-pdf-tools
-exec uv run mcp-pdf-tools "$@"
--- a/src/mcp_pdf_tools/server.py
+++ b/src/mcp_pdf_tools/server.py
--- a/test_pages_parameter.py
+++ b/test_pages_parameter.py
@ -1,52 +0,0 @@
-#!/usr/bin/env python3
-"""
-Test the updated pages parameter parsing
-"""
-
-import asyncio
-import sys
-import os
-
-# Add src to path
-sys.path.insert(0, 'src')
-
-from mcp_pdf_tools.server import parse_pages_parameter
-
-def test_page_parsing():
-    """Test page parameter parsing (1-based user input -> 0-based internal)"""
-    print("Testing page parameter parsing (1-based user input -> 0-based internal)...")
-    
-    # Test different input formats - all converted from 1-based user input to 0-based internal
-    test_cases = [
-        (None, None),
-        ("1,2,3", [0, 1, 2]),  # 1-based input -> 0-based internal
-        ("[2, 3]", [1, 2]),    # This is the problematic case from the user
-        ("5", [4]),            # Page 5 becomes index 4
-        ([1, 2, 3], [0, 1, 2]), # List input also converted
-        ("2,3,4", [1, 2, 3]),   # Pages 2,3,4 -> indexes 1,2,3
-        ("[1,2,3]", [0, 1, 2])  # Another format
-    ]
-    
-    all_passed = True
-    
-    for input_val, expected in test_cases:
-        try:
-            result = parse_pages_parameter(input_val)
-            if result == expected:
-                print(f"✅ '{input_val}' -> {result}")
-            else:
-                print(f"❌ '{input_val}' -> {result}, expected {expected}")
-                all_passed = False
-        except Exception as e:
-            print(f"❌ '{input_val}' -> Error: {e}")
-            all_passed = False
-    
-    return all_passed
-
-if __name__ == "__main__":
-    success = test_page_parsing()
-    if success:
-        print("\n🎉 All page parameter parsing tests passed!")
-    else:
-        print("\n🚨 Some tests failed!")
-    sys.exit(0 if success else 1)
--- a/test_url_support.py
+++ b/test_url_support.py
@ -1,71 +0,0 @@
-#!/usr/bin/env python3
-"""
-Test URL support for MCP PDF Tools
-"""
-
-import asyncio
-import sys
-import os
-
-# Add src to path
-sys.path.insert(0, 'src')
-
-from mcp_pdf_tools.server import validate_pdf_path, download_pdf_from_url
-
-async def test_url_validation():
-    """Test URL validation and download"""
-    print("Testing URL validation and download...")
-    
-    # Test with a known PDF URL (using a publicly available sample)
-    test_url = "https://www.w3.org/WAI/ER/tests/xhtml/testfiles/resources/pdf/dummy.pdf"
-    
-    try:
-        print(f"Testing URL: {test_url}")
-        path = await validate_pdf_path(test_url)
-        print(f"✅ Successfully downloaded and validated PDF: {path}")
-        print(f"   File size: {path.stat().st_size} bytes")
-        return True
-        
-    except Exception as e:
-        print(f"❌ URL test failed: {e}")
-        return False
-
-async def test_local_path():
-    """Test that local paths still work"""
-    print("\nTesting local path validation...")
-    
-    # Test with our existing test PDF
-    test_path = "/tmp/test_text.pdf"
-    
-    if not os.path.exists(test_path):
-        print(f"⚠️  Test file {test_path} not found, skipping local test")
-        return True
-    
-    try:
-        path = await validate_pdf_path(test_path)
-        print(f"✅ Local path validation works: {path}")
-        return True
-        
-    except Exception as e:
-        print(f"❌ Local path test failed: {e}")
-        return False
-
-async def main():
-    print("🧪 Testing MCP PDF Tools URL Support\n")
-    
-    url_success = await test_url_validation()
-    local_success = await test_local_path()
-    
-    print(f"\n📊 Test Results:")
-    print(f"   URL support: {'✅ PASS' if url_success else '❌ FAIL'}")
-    print(f"   Local paths: {'✅ PASS' if local_success else '❌ FAIL'}")
-    
-    if url_success and local_success:
-        print("\n🎉 All tests passed! URL support is working.")
-        return 0
-    else:
-        print("\n🚨 Some tests failed.")
-        return 1
-
-if __name__ == "__main__":
-    sys.exit(asyncio.run(main()))