Skip to content

Overview

Factflow is a YAML-configured pipeline orchestration platform for AI data processing. You declare pipelines as YAML, the server turns them into running Python adapter chains, and messages flow between stages through a message broker while the lineage service records every hop.

This page is the big picture. Each subsystem has its own dedicated Guide with the “how it works” and “how to operate it” in one place.

Three moving parts, one invariant:

flowchart LR
Y[YAML config] --> O[Orchestrator]
O --> R1[Route processor]
O --> R2[Route processor]
R1 -->|messages on queue| R2
R1 --> S[(Storage)]
R1 --> L[(Lineage)]
R2 --> S
R2 --> L
Pipeline = YAML + orchestrator + routes wired through a queue, with every step recorded to storage and lineage
  • Declarative. A pipeline is a YAML file. Operators iterate on configs without a deploy.
  • Queue-scoped per execution. Multiple executions share one broker without cross-talk.
  • Commit-independent lineage. Lineage records every message without ever blocking the main flow.

Four things Factflow is not, to set expectations:

  • Not a generic task queue (Celery, RQ) — pipelines are graphs, not one-off tasks.
  • Not a workflow engine like Airflow — the unit of work is a message through an adapter, not a DAG tick.
  • Not a pile of LLM scripts — LLM calls are abstracted, rate-limited, and routed by config.
  • Not schedule-driven — executions are reactive, not polled.

Factflow is a uv workspace of 16 Python packages organised into three tiers. Packages import only from the same tier or below.

flowchart TB
subgraph Workflows["WORKFLOWS (8) — pipeline adapters"]
  WS[factflow-webscraper]
  MD[factflow-markdown]
  EM[factflow-embeddings]
  BO[factflow-boost]
  TR[factflow-translator]
  KN[factflow-knowledge]
  SP[factflow-sharepoint]
  RP[factflow-replay]
end
subgraph Shared["SHARED SERVICES (7)"]
  SRV[factflow-server]
  EX[factflow-execution]
  EN[factflow-engine]
  INF[factflow-infra]
  LLM[factflow-llm]
  LIN[factflow-lineage]
  FN[factflow-foundation]
end
subgraph Core["CORE PROTOCOLS (1)"]
  PR[factflow-protocols]
end
Workflows --> Shared
Shared --> Core
style Core fill:#e6f2ff,stroke:#2563eb,color:#111
style Shared fill:#f0fdf4,stroke:#16a34a,color:#111
style Workflows fill:#fef3c7,stroke:#d97706,color:#111
Three tiers. Workflows depend on shared services, which depend on the core protocol package. No reverse imports, no cross-workflow imports.

factflow-protocols — zero factflow workspace dependencies. Defines abstract contracts: PipelineAdapter, QueueProtocol, StorageProtocol, LLMClientProtocol, EmbeddingClientProtocol, plus supporting types. Every other package implements one or more of these.

Concrete services that implement the protocols and run the system:

Each is a self-contained, domain-specific feature package with its own adapters. Workflows never import each other — they compose via queues.

Each workflow has its own Guides section (e.g. Workflow: Webscraper) covering the adapters it provides and how they compose.

What actually happens when you run factflow config run <id>:

sequenceDiagram
actor Op as Operator
participant CLI as factflow CLI
participant API as factflow-server
participant Mgr as OrchestratorManager
participant Orch as PipelineOrchestrator
participant P as ReactiveRouteProcessor
participant Q as QueueProtocol (ExecScoped)
participant A as PipelineAdapter
participant Store as StorageProtocol
participant Lin as LineageService
Op->>CLI: factflow config run ID
CLI->>API: POST /executions
API->>Mgr: start(execution)
Mgr->>Orch: new PipelineOrchestrator
Orch->>P: start route processors
P->>Q: subscribe exec:EXEC:route.in
Orch->>Q: publish init_message
Q->>P: deliver message
P->>A: await adapter.process(ctx)
A->>Store: write artefact
A->>Lin: record lineage row
A-->>P: AdapterResult
P->>Q: publish to next route
P-->>Q: ACK original message
Orch->>API: completion signal
API-->>CLI: SSE status=completed
One execution end-to-end. Dive deeper in Architecture → Execution flow.

Every arrow above lives under a deeper page:

The non-obvious calls that shape day-to-day use:

  • Flat package namespace. Every package imports as factflow_<name> (no dotted factflow.X). Chosen for IDE tooling.
  • Protocol / Provider / Adapter naming. Abstract contract / infrastructure implementation / domain adapter. Three-way split keeps each package’s blast radius small.
  • ExecutionScopedQueue over per-execution brokers. One broker, isolated namespaces — operational simplicity at zero correctness cost.
  • Config snapshot is immutable per execution. In-flight executions are unaffected by config edits. Replay resolves routes from the parent execution’s snapshot, not the global directory.
  • Handler-return ack. Adapters return AdapterResult or raise; queue ack happens only after return. No fire-and-forget.
  • Commit-independent lineage. Lineage writes decouple from pipeline writes. Either can fail without affecting the other.
  • Reactive, not polled. No central scheduler. Processors consume-and-ack. Natural backpressure.