- Replace unreliable Word page detection with element-based limiting
- Cap extraction at 25 paragraphs per 'page' requested (max 100 total)
- Cap extraction at 8k chars per 'page' requested (max 40k total)
- Add early termination when limits reached
- Add processing_limits metadata to show actual extraction stats
- Prevent 1.28M token responses by stopping at reasonable content limits
- Single page (page_range='1') now limited to ~25 paragraphs/8k chars