Skip to content

Lineage

Lineage records one row per message per adapter invocation. Input hash, output hash, status, timestamps, error payload. Independent of the main pipeline flow — a broken pipeline still records the lineage up to the point of breakage.

This guide is the operator’s lineage toolbox: how to find a failure, trace it to its root cause, and audit what ran.

Most recent failures across all executions:

Terminal window
factflow lineage failures

Scoped to one execution:

Terminal window
factflow lineage failures --execution EXEC_ID

Each row shows: lineage id, execution, route, adapter, timestamp, error class. Pick the one you want to drill into.

Given a lineage id, walk both directions (ancestors + descendants):

Terminal window
factflow lineage chain LINEAGE_ID

Ancestors let you see what input produced this message. Descendants let you see how far the failure cascaded.

Just children (messages this one produced):

Terminal window
factflow lineage children LINEAGE_ID

Just one row:

Terminal window
factflow lineage get LINEAGE_ID

All rows, filterable:

Terminal window
factflow lineage list --execution EXEC_ID
factflow lineage list --execution EXEC_ID --status failed
factflow lineage list --execution EXEC_ID --adapter web_scraper

If an adapter processed messages in a batch (e.g., embedding generator batching 100 texts at a time), every message in the batch shares a batch_id. Inspect one batch:

Terminal window
factflow lineage batch BATCH_ID

Useful when debugging partial batch failures.

Counts per route and adapter:

Terminal window
factflow lineage stats --execution EXEC_ID

Sample output (abbreviated):

ROUTE ADAPTER COUNT COMPLETED FAILED
sitemap_scraper sitemap_parser 1 1 0
sitemap_scraper url_expander 1 1 0
sitemap_scraper web_scraper 42 40 2
sitemap_scraper web_content_storage 40 40 0

Two failures in web_scraper. That’s where to investigate.

  1. factflow execution get EXEC_ID — confirm status=failed
  2. factflow lineage failures --execution EXEC_ID — find the failing rows
  3. Pick one row’s id, run factflow lineage chain ID to see the parent chain
  4. The chain’s root points back to an input message — find its storage key via metadata.input_key
  5. factflow storage get INPUT_KEY to read the bytes that tripped the adapter
  6. Reproduce locally in a unit test
  7. Fix, push, redeploy
  8. factflow execution replay EXEC_ID --from-stage FAILED_STAGE to rerun with the same input
  • GET /api/v1/lineage?execution_id=X&status=failed — list with filters
  • GET /api/v1/lineage/{id}/chain — forward + backward chain
  • GET /api/v1/lineage/{id}/children
  • GET /api/v1/lineage/failures
  • GET /api/v1/lineage/stats?execution_id=X

Two guarantees lineage maintains:

  1. Commits independently. A lineage DB issue never fails the pipeline. A pipeline crash still records lineage up to the crash.
  2. Pending children are counted. A parent row is only truly complete when its recorded children have also completed. Prevents false-positive “done” states in chain queries.