Storage

Every pipeline writes to storage via StorageProtocol. Two providers ship: filesystem (dev + single-node prod) and MinIO / S3 (multi-node prod). The CLI and API work identically across both.

Browse the hierarchy

The default key convention is executions/<exec-id>/<route>/<stage>/<msg-id>. Browse by prefix:

factflow storage browse executions/EXEC_ID/
factflow storage browse executions/EXEC_ID/sitemap_scraper/
factflow storage browse executions/EXEC_ID/sitemap_scraper/web_scraper/

browse groups by prefix; list gives a flat listing.

List by execution

factflow storage list --execution EXEC_ID
factflow storage list --execution EXEC_ID --limit 100

Read an object

factflow storage get KEY
factflow storage get KEY --output json

For binary content, pipe to a file:

factflow storage get executions/EXEC_ID/path/to/object > out.bin

Inspect metadata

Every object has a sidecar (.meta.json on filesystem, object metadata on S3/MinIO):

factflow storage metadata KEY

Returns the full metadata blob — provenance, config hash, lineage reference, adapter-specific extras.

Download execution output

Bulk download every artefact for an execution:

factflow storage download EXEC_ID --output-dir ./out/

Preserves the storage key hierarchy under the output directory. Useful for handoff to external tooling.

Watch live writes

During a running execution, stream writes as they happen:

factflow storage watch --execution EXEC_ID

Uses Server-Sent Events (SSE) under the hood. Exits when the execution completes or Ctrl+C.

Filter to a specific route:

factflow storage watch --execution EXEC_ID --route markdown_processor

Via the API

GET /api/v1/storage/objects?prefix=... — list
GET /api/v1/storage/objects/{key} — read bytes
GET /api/v1/storage/objects/{key}/metadata — read sidecar
POST /api/v1/content/archive/prepare + GET /api/v1/content/archive/{job}/status — async bulk download (the download CLI uses this)

Retention

Storage is the replay source of truth. If you delete an object, any execution that depended on it can no longer replay the downstream stage.

Safe retention policy:

Never expire objects referenced by executions you might replay (bounded by how far back you replay — usually 30–90 days)
Bulk-prune with an execution-level delete flow (not yet automated in the CLI; manual rm or bucket lifecycle rules)

Configuration

Set the storage URI via config / env:

FACTFLOW_STORAGE_URI=file://./output/storage     # filesystem
FACTFLOW_STORAGE_URI=minio://factflow-bucket      # MinIO / S3

See factflow-infra for the full settings shape.

Concept: Storage model — protocol, sidecars, key convention
factflow-infra reference — provider implementations
Replay guide — storage → replay relationship