Storage
Every pipeline writes to storage via StorageProtocol. Two providers ship: filesystem (dev + single-node prod) and MinIO / S3 (multi-node prod). The CLI and API work identically across both.
Browse the hierarchy
Section titled “Browse the hierarchy”The default key convention is executions/<exec-id>/<route>/<stage>/<msg-id>. Browse by prefix:
factflow storage browse executions/EXEC_ID/factflow storage browse executions/EXEC_ID/sitemap_scraper/factflow storage browse executions/EXEC_ID/sitemap_scraper/web_scraper/browse groups by prefix; list gives a flat listing.
List by execution
Section titled “List by execution”factflow storage list --execution EXEC_IDfactflow storage list --execution EXEC_ID --limit 100Read an object
Section titled “Read an object”factflow storage get KEYfactflow storage get KEY --output jsonFor binary content, pipe to a file:
factflow storage get executions/EXEC_ID/path/to/object > out.binInspect metadata
Section titled “Inspect metadata”Every object has a sidecar (.meta.json on filesystem, object metadata on S3/MinIO):
factflow storage metadata KEYReturns the full metadata blob — provenance, config hash, lineage reference, adapter-specific extras.
Download execution output
Section titled “Download execution output”Bulk download every artefact for an execution:
factflow storage download EXEC_ID --output-dir ./out/Preserves the storage key hierarchy under the output directory. Useful for handoff to external tooling.
Watch live writes
Section titled “Watch live writes”During a running execution, stream writes as they happen:
factflow storage watch --execution EXEC_IDUses Server-Sent Events (SSE) under the hood. Exits when the execution completes or Ctrl+C.
Filter to a specific route:
factflow storage watch --execution EXEC_ID --route markdown_processorVia the API
Section titled “Via the API”GET /api/v1/storage/objects?prefix=...— listGET /api/v1/storage/objects/{key}— read bytesGET /api/v1/storage/objects/{key}/metadata— read sidecarPOST /api/v1/content/archive/prepare+GET /api/v1/content/archive/{job}/status— async bulk download (the download CLI uses this)
Retention
Section titled “Retention”Storage is the replay source of truth. If you delete an object, any execution that depended on it can no longer replay the downstream stage.
Safe retention policy:
- Never expire objects referenced by executions you might replay (bounded by how far back you replay — usually 30–90 days)
- Bulk-prune with an execution-level delete flow (not yet automated in the CLI; manual
rmor bucket lifecycle rules)
Configuration
Section titled “Configuration”Set the storage URI via config / env:
FACTFLOW_STORAGE_URI=file://./output/storage # filesystemFACTFLOW_STORAGE_URI=minio://factflow-bucket # MinIO / S3See factflow-infra for the full settings shape.
Related
Section titled “Related”- Concept: Storage model — protocol, sidecars, key convention
- factflow-infra reference — provider implementations
- Replay guide — storage → replay relationship