2 Commits

Author SHA1 Message Date
d569034fa3 Add MCP resource system for embedded document content
Some checks are pending
Test Dashboard / test-and-dashboard (push) Waiting to run
Implements URI-based access to document content with:

- ResourceStore for caching extracted images, chapters, sheets, slides
- Content-based document IDs (SHA256 hash) for stable URIs across sessions
- 11 resource templates with flexible URI patterns:
  - Binary: image://, chart://, media://, embed://
  - Text: chapter://, section://, sheet://, slide://
  - Ranges: chapters://doc/1-5, slides://doc/1,3,5
  - Hierarchical: paragraph://doc/3/5

- Format suffixes for output control:
  - chapter://doc/3.md (default markdown)
  - chapter://doc/3.txt (plain text)
  - chapter://doc/3.html (basic HTML)

- index_document tool scans and populates resources:
  - Word: chapters as markdown, embedded images
  - Excel: sheets as markdown tables
  - PowerPoint: slides as markdown

Tool responses return URIs instead of blobs - clients fetch only what they need.
2026-01-11 09:04:29 -07:00
34e636e782 Add documentation for DOCX processing fixes
Documents 6 critical bugs discovered while processing a 200+ page
manuscript, including the root cause xpath API mismatch between
python-docx and lxml that caused silent failures in chapter search.
2026-01-11 06:47:39 -07:00