Skip to content

Lineage and correlation

Lineage is the forensic record of what happened in a pipeline. One row per message per adapter invocation. Input hash, output hash, status, error payload (if failed), timestamps, correlation id. Stored in PostgreSQL, queried via the API, browsed via the CLI.

id UUID
execution_id UUID -- which execution
route_id string -- which route
adapter_name string -- which adapter
parent_id UUID|null -- correlation: the previous row in this chain
status enum -- pending | completed | failed | cancelled
input_hash string -- hash of the inbound message
output_hash string|null
error JSON|null -- exception class + message + traceback (if failed)
started_at timestamp
completed_at timestamp|null
batch_id UUID|null -- if part of a batch
metadata JSON -- adapter-specific extras

A message that flows through three adapters in a route produces three lineage rows, linked parent-to-child.

Lineage is a DAG over messages. Every child row carries its parent’s id. This makes two queries trivial:

  • Forward trace — starting at message X, find every descendant (transitive closure over parent_id).
  • Backward trace — starting at a failed message, find the full ancestor chain up to the route’s initial message.

The factflow lineage chain <id> command returns both directions.

Lineage writes happen out of band from the pipeline. The pipeline publishes an async write; lineage commits it on a separate connection. Consequence: a lineage DB hiccup never fails the pipeline, and a pipeline crash still records the lineage up to the point of crash.

This is why you can debug a broken pipeline by reading lineage even when the orchestrator is fatally stuck — the lineage trail is independent.

Pending children race (and why it’s fixed)

Section titled “Pending children race (and why it’s fixed)”

Early versions of the lineage service had a race: a parent row could be marked completed before its children registered. A naive query “is this row done?” would return yes, but children were still pending.

Fix: children pre-register before the parent’s status transition. The pending_children count on the parent is incremented when a child is about to be written, decremented when the child finishes. A parent is only “truly” done when status=completed AND pending_children=0.

The API and CLI respect this. factflow lineage chain returns completion states that honour pending children.

One adapter failing recording lineage never cascades. If lineage itself has issues (DB down, pool exhausted), the pipeline continues — rows are queued to an in-memory buffer until the DB returns. On catastrophic loss, the buffer is flushed to a structured log so operators can reconstruct after the fact.

Compare with systems where observability failures halt the main flow — Factflow explicitly decouples them.

  • GET /api/v1/lineage?execution_id=...&status=failed — filter + paginate
  • GET /api/v1/lineage/{id}/chain — forward + backward chain
  • GET /api/v1/lineage/{id}/children
  • GET /api/v1/lineage/failures — most recent failures across executions
  • GET /api/v1/lineage/stats — summary counters per execution
Terminal window
factflow lineage list --execution ID
factflow lineage chain MSG_ID
factflow lineage children MSG_ID
factflow lineage failures
factflow lineage stats --execution ID
  1. An execution fails — factflow execution get ID shows status=failed
  2. factflow lineage failures --execution ID — find the adapter that tripped it
  3. factflow lineage chain MSG_ID — walk back to the original input
  4. factflow storage get <key> — read the problematic input bytes
  5. Fix the adapter locally; test it; redeploy
  6. factflow execution replay ID --from-stage <stage> — rerun from the broken stage with the same input