Lineage

Lineage records one row per message per adapter invocation. Input hash, output hash, status, timestamps, error payload. Independent of the main pipeline flow — a broken pipeline still records the lineage up to the point of breakage.

This guide is the operator’s lineage toolbox: how to find a failure, trace it to its root cause, and audit what ran.

Find a failure

Most recent failures across all executions:

factflow lineage failures

Scoped to one execution:

factflow lineage failures --execution EXEC_ID

Each row shows: lineage id, execution, route, adapter, timestamp, error class. Pick the one you want to drill into.

Trace a chain

Given a lineage id, walk both directions (ancestors + descendants):

factflow lineage chain LINEAGE_ID

Ancestors let you see what input produced this message. Descendants let you see how far the failure cascaded.

Just children (messages this one produced):

factflow lineage children LINEAGE_ID

Just one row:

factflow lineage get LINEAGE_ID

List lineage for an execution

All rows, filterable:

factflow lineage list --execution EXEC_ID
factflow lineage list --execution EXEC_ID --status failed
factflow lineage list --execution EXEC_ID --adapter web_scraper

Batch view

If an adapter processed messages in a batch (e.g., embedding generator batching 100 texts at a time), every message in the batch shares a batch_id. Inspect one batch:

factflow lineage batch BATCH_ID

Useful when debugging partial batch failures.

Summary stats

Counts per route and adapter:

factflow lineage stats --execution EXEC_ID

Sample output (abbreviated):

ROUTE                ADAPTER                COUNT   COMPLETED   FAILED
sitemap_scraper      sitemap_parser         1       1           0
sitemap_scraper      url_expander           1       1           0
sitemap_scraper      web_scraper            42      40          2
sitemap_scraper      web_content_storage    40      40          0

Two failures in web_scraper. That’s where to investigate.

Debugging workflow

factflow execution get EXEC_ID — confirm status=failed
factflow lineage failures --execution EXEC_ID — find the failing rows
Pick one row’s id, run factflow lineage chain ID to see the parent chain
The chain’s root points back to an input message — find its storage key via metadata.input_key
factflow storage get INPUT_KEY to read the bytes that tripped the adapter
Reproduce locally in a unit test
Fix, push, redeploy
factflow execution replay EXEC_ID --from-stage FAILED_STAGE to rerun with the same input

Via the API

GET /api/v1/lineage?execution_id=X&status=failed — list with filters
GET /api/v1/lineage/{id}/chain — forward + backward chain
GET /api/v1/lineage/{id}/children
GET /api/v1/lineage/failures
GET /api/v1/lineage/stats?execution_id=X

Invariants

Two guarantees lineage maintains:

Commits independently. A lineage DB issue never fails the pipeline. A pipeline crash still records lineage up to the crash.
Pending children are counted. A parent row is only truly complete when its recorded children have also completed. Prevents false-positive “done” states in chain queries.

Concept: Lineage — the underlying model
factflow-lineage reference
Replay guide — how lineage feeds replay decisions