Lineage and correlation

Lineage is the forensic record of what happened in a pipeline. One row per message per adapter invocation. Input hash, output hash, status, error payload (if failed), timestamps, correlation id. Stored in PostgreSQL, queried via the API, browsed via the CLI.

What a lineage row captures

id              UUID
execution_id    UUID       -- which execution
route_id        string     -- which route
adapter_name    string     -- which adapter
parent_id       UUID|null  -- correlation: the previous row in this chain
status          enum       -- pending | completed | failed | cancelled
input_hash      string     -- hash of the inbound message
output_hash     string|null
error           JSON|null  -- exception class + message + traceback (if failed)
started_at      timestamp
completed_at    timestamp|null
batch_id        UUID|null  -- if part of a batch
metadata        JSON       -- adapter-specific extras

A message that flows through three adapters in a route produces three lineage rows, linked parent-to-child.

Correlation: parent_id as the backbone

Lineage is a DAG over messages. Every child row carries its parent’s id. This makes two queries trivial:

Forward trace — starting at message X, find every descendant (transitive closure over parent_id).
Backward trace — starting at a failed message, find the full ancestor chain up to the route’s initial message.

The factflow lineage chain <id> command returns both directions.

Commits independently

Lineage writes happen out of band from the pipeline. The pipeline publishes an async write; lineage commits it on a separate connection. Consequence: a lineage DB hiccup never fails the pipeline, and a pipeline crash still records the lineage up to the point of crash.

This is why you can debug a broken pipeline by reading lineage even when the orchestrator is fatally stuck — the lineage trail is independent.

Pending children race (and why it’s fixed)

Early versions of the lineage service had a race: a parent row could be marked completed before its children registered. A naive query “is this row done?” would return yes, but children were still pending.

Fix: children pre-register before the parent’s status transition. The pending_children count on the parent is incremented when a child is about to be written, decremented when the child finishes. A parent is only “truly” done when status=completed AND pending_children=0.

The API and CLI respect this. factflow lineage chain returns completion states that honour pending children.

Failure isolation

One adapter failing recording lineage never cascades. If lineage itself has issues (DB down, pool exhausted), the pipeline continues — rows are queued to an in-memory buffer until the DB returns. On catastrophic loss, the buffer is flushed to a structured log so operators can reconstruct after the fact.

Compare with systems where observability failures halt the main flow — Factflow explicitly decouples them.

Querying lineage

Over the API

GET /api/v1/lineage?execution_id=...&status=failed — filter + paginate
GET /api/v1/lineage/{id}/chain — forward + backward chain
GET /api/v1/lineage/{id}/children
GET /api/v1/lineage/failures — most recent failures across executions
GET /api/v1/lineage/stats — summary counters per execution

Over the CLI

factflow lineage list --execution ID
factflow lineage chain MSG_ID
factflow lineage children MSG_ID
factflow lineage failures
factflow lineage stats --execution ID

Typical debugging flow

An execution fails — factflow execution get ID shows status=failed
factflow lineage failures --execution ID — find the adapter that tripped it
factflow lineage chain MSG_ID — walk back to the original input
factflow storage get <key> — read the problematic input bytes
Fix the adapter locally; test it; redeploy
factflow execution replay ID --from-stage <stage> — rerun from the broken stage with the same input

factflow-lineage reference — the service and repository
Lineage guide — CLI-first debugging walkthrough
Replay — uses lineage + storage to re-run stages