Ryan Malloy dfbf3d1870 🔧 v2.0.7: Fix table extraction token overflow with smart limiting
PROBLEM:
Table extraction from large PDFs was exceeding MCP's 25,000 token limit,
causing "response too large" errors. A 5-page PDF with large tables
generated 59,005 tokens, more than double the allowed limit.

SOLUTION:
Added flexible table data limiting with two new parameters:
- max_rows_per_table: Limit rows returned per table (prevents overflow)
- summary_only: Return only metadata without table data

IMPLEMENTATION:
1. Added new parameters to extract_tables() method signature
2. Created _process_table_data() helper for consistent limiting logic
3. Updated all 3 extraction methods (Camelot, pdfplumber, Tabula)
4. Enhanced table metadata with truncation tracking:
   - total_rows: Full row count from PDF
   - rows_returned: Actual rows in response (after limiting)
   - rows_truncated: Number of rows omitted (if limited)

USAGE EXAMPLES:
# Summary mode - metadata only (smallest response)
extract_tables(pdf_path, pages="1-5", summary_only=True)

# Limited data - first 100 rows per table
extract_tables(pdf_path, pages="1-5", max_rows_per_table=100)

# Full data (default behavior, may overflow on large tables)
extract_tables(pdf_path, pages="1-5")

BENEFITS:
- Prevents MCP token overflow errors
- Maintains backward compatibility (new params are optional)
- Clear guidance through metadata (shows when truncation occurred)
- Flexible - users choose between summary/limited/full modes

FILES MODIFIED:
- src/mcp_pdf/mixins_official/table_extraction.py (all changes)
- src/mcp_pdf/server.py (version bump to 2.0.7)
- pyproject.toml (version bump to 2.0.7)

VERSION: 2.0.7
PUBLISHED: https://pypi.org/project/mcp-pdf/2.0.7/
2025-11-03 18:26:34 -07:00
..