Content archive
When you want every storage artefact from an execution in one file — for handoff to external tooling, for offline analysis, for archival — use the content archive API. It’s async: you request, poll, then download.
5 endpoints under /api/v1/content.
Request an archive
Section titled “Request an archive”curl -X POST http://localhost:8000/api/v1/content/archive/prepare \ -H 'Content-Type: application/json' \ -d '{"execution_id": "EXEC_ID"}'Response:
{ "job_id": "archive-job-uuid", "status": "queued"}The server queues the archive job, walks executions/EXEC_ID/** in storage, bundles into a zip, and writes the zip back to storage under archives/<job_id>.zip.
Optional scoping
Section titled “Optional scoping”Narrow the archive:
curl -X POST http://localhost:8000/api/v1/content/archive/prepare \ -H 'Content-Type: application/json' \ -d '{ "execution_id": "EXEC_ID", "routes": ["sitemap_scraper"], "stages": ["web_scraper"], "include_metadata": true }'routes— only include objects under these route idsstages— only include objects with these stage namesinclude_metadata— true (default) ships sidecars; false keeps the archive smaller
Poll status
Section titled “Poll status”curl http://localhost:8000/api/v1/content/archive/JOB_ID/statusResponse:
{ "job_id": "...", "status": "running", "progress": 0.42, "object_count": 1200, "bytes_packed": 34567890}Statuses: queued, running, completed, failed. Progress is [0, 1].
Download
Section titled “Download”Once status == "completed":
curl -O http://localhost:8000/api/v1/content/archive/JOB_ID/downloadReturns the zip file bytes. Size can be large — downloads are chunked.
CLI wrapper
Section titled “CLI wrapper”The CLI hides the polling loop:
factflow storage download EXEC_ID --output-dir ./out/Internally:
- POSTs
/content/archive/prepare - Polls
/content/archive/{job}/statusevery 2 seconds untilcompleted - Downloads, extracts to
--output-dir - Deletes the server-side archive
Count (fast)
Section titled “Count (fast)”If you want to know how big an archive would be before requesting:
curl 'http://localhost:8000/api/v1/content/count?execution_id=EXEC_ID'Returns object count and total byte size. O(list) — fast.
Lifecycle
Section titled “Lifecycle”Server-side archives are short-lived:
- TTL 24 hours by default
- Cleaned up automatically
- Re-request if you missed the window
Failure modes
Section titled “Failure modes”- Execution has no storage —
404 No objects found - Archive too large (beyond configured limit) —
413 Payload too large. Scope withroutes:/stages:. - Storage provider error mid-archive — job transitions to
failedwith an error payload; retry is manual
Related
Section titled “Related”- CLI: storage download
- Reference: API — content schemas
- Storage guide