4 Commits

Author SHA1 Message Date
a3c7b69ba8 Release 2026.4.21.1: refresh project URLs after warehack.ing transfer
Some checks failed
CI / test (3.10) (push) Has been cancelled
CI / test (3.11) (push) Has been cancelled
CI / test (3.12) (push) Has been cancelled
CI / test (3.13) (push) Has been cancelled
PyPI metadata is immutable per version, so this post-release exists solely
to refresh the [project.urls] block: Homepage / Repository / Bug Tracker /
Changelog now point at git.supported.systems/warehack.ing/mcarchive-org
(the new canonical home after the org transfer).

No code changes. Same wheel contents as 2026.4.21, only METADATA URLs
differ.
2026-04-21 22:17:50 -06:00
52a2be7cc6 Release prep: CHANGELOG, CI workflow, Gitea project URLs
Some checks are pending
CI / test (3.10) (push) Waiting to run
CI / test (3.11) (push) Waiting to run
CI / test (3.12) (push) Waiting to run
CI / test (3.13) (push) Waiting to run
- CHANGELOG.md documents the 2026.04.21 initial release: full tool
  inventory, every reliability claim, and test count (66/66 green).
- .github/workflows/ci.yml runs ruff check + pytest -m 'not network'
  across Python 3.10/3.11/3.12/3.13 on push and PR. Skips live archive.org
  tests in CI to keep runs fast and avoid hammering archive.org.
- pyproject.toml [project.urls]: point Homepage / Repository / Bug Tracker
  / Changelog at git.supported.systems/rsp2k/mcarchive-org. Keep the
  archive.org developer docs link for context.
2026-04-21 21:20:56 -06:00
4a03af1675 Hardening: address Hamilton review ship-blockers
Critical fixes:
- Validate identifier (^[A-Za-z0-9._-]+$) and filename (no '..', absolute
  paths, NUL bytes, drive letters) at the client boundary
- Confine download destinations under MCARCHIVE_DOWNLOAD_ROOT via
  Path.resolve() + is_relative_to() check; reject symlinked dirs
- Use O_NOFOLLOW on the destination open() to refuse symlink substitution
- Detect Range-ignored responses: if resume requested but server returns 200
  (or 206 with wrong Content-Range start), raise ArchiveError BEFORE writing
  any bytes — closes the silent file-corruption hole

Usability:
- Wrap raise_for_status everywhere with ArchiveError that includes the
  response body preview — 4xx Solr errors now tell you what's wrong
- URL-encode filenames in download URLs (handles spaces and special chars)
- Map archive.org's {"error": ...} payloads on /metadata/{id}/files to
  ArchiveError with the server's message
- Lazy-resolve download root so env-var changes after import are honored
- Refactor item_resource to a shared async helper (drops .fn type-ignore)
- Rename result key 'bytes' -> 'bytes_written' (avoids shadowing builtin)

Tests:
- New tests/test_client_mocked.py: 29 regression tests using
  httpx.MockTransport covering every Hamilton finding above (path traversal,
  symlink refusal, Range-ignored, Content-Range mismatch, error body
  surfacing, malformed JSON, dark items, etc.)
- Set asyncio_mode = "auto" in pyproject for cleaner test markers

33/33 tests pass (4 live + 29 mocked), ruff clean.
2026-04-21 15:34:30 -06:00
5265a6440b Initial mcarchive-org MCP server
FastMCP server wrapping archive.org's public read APIs:
- search_items / scrape_items: advanced search + bulk cursor pagination
- get_item_metadata / list_files: progressive disclosure with filtering
- get_file_url / download_file: canonical URLs and streaming downloads
  with HTTP Range resume + optional MD5 verification

Smoke-tested end-to-end via claude -p headless MCP and pytest against
live archive.org endpoints.
2026-04-21 09:41:20 -06:00