Skip to content

Storage

Every pipeline writes to storage via StorageProtocol. Two providers ship: filesystem (dev + single-node prod) and MinIO / S3 (multi-node prod). The CLI and API work identically across both.

The default key convention is executions/<exec-id>/<route>/<stage>/<msg-id>. Browse by prefix:

Terminal window
factflow storage browse executions/EXEC_ID/
factflow storage browse executions/EXEC_ID/sitemap_scraper/
factflow storage browse executions/EXEC_ID/sitemap_scraper/web_scraper/

browse groups by prefix; list gives a flat listing.

Terminal window
factflow storage list --execution EXEC_ID
factflow storage list --execution EXEC_ID --limit 100
Terminal window
factflow storage get KEY
factflow storage get KEY --output json

For binary content, pipe to a file:

Terminal window
factflow storage get executions/EXEC_ID/path/to/object > out.bin

Every object has a sidecar (.meta.json on filesystem, object metadata on S3/MinIO):

Terminal window
factflow storage metadata KEY

Returns the full metadata blob — provenance, config hash, lineage reference, adapter-specific extras.

Bulk download every artefact for an execution:

Terminal window
factflow storage download EXEC_ID --output-dir ./out/

Preserves the storage key hierarchy under the output directory. Useful for handoff to external tooling.

During a running execution, stream writes as they happen:

Terminal window
factflow storage watch --execution EXEC_ID

Uses Server-Sent Events (SSE) under the hood. Exits when the execution completes or Ctrl+C.

Filter to a specific route:

Terminal window
factflow storage watch --execution EXEC_ID --route markdown_processor
  • GET /api/v1/storage/objects?prefix=... — list
  • GET /api/v1/storage/objects/{key} — read bytes
  • GET /api/v1/storage/objects/{key}/metadata — read sidecar
  • POST /api/v1/content/archive/prepare + GET /api/v1/content/archive/{job}/status — async bulk download (the download CLI uses this)

Storage is the replay source of truth. If you delete an object, any execution that depended on it can no longer replay the downstream stage.

Safe retention policy:

  • Never expire objects referenced by executions you might replay (bounded by how far back you replay — usually 30–90 days)
  • Bulk-prune with an execution-level delete flow (not yet automated in the CLI; manual rm or bucket lifecycle rules)

Set the storage URI via config / env:

FACTFLOW_STORAGE_URI=file://./output/storage # filesystem
FACTFLOW_STORAGE_URI=minio://factflow-bucket # MinIO / S3

See factflow-infra for the full settings shape.