Boost workflow
Closed pipeline for processing Boost.AI export folders. Not composed with other workflows — produces a standalone catalogue.
Canonical pipeline
Section titled “Canonical pipeline”enumerator → filter → norwegian_filter → deduplicate → clustering → catalog → storage_writer → rendererMinimal pipeline
Section titled “Minimal pipeline”A shipped example lives at backend/config/pipelines/dnb/boost-export.yaml. Adapt for your export folder.
What the chain does
Section titled “What the chain does”| Stage | Purpose |
|---|---|
enumerator | Walk the export folder; emit one message per conversation file |
filter | Drop conversations matching exclusion rules (e.g. operator-only, system messages) |
norwegian_filter | Language-detect; scope to Norwegian content |
deduplicate | MinHash-based near-duplicate removal (uses datasketch) |
clustering | Group similar conversations (scikit-learn) |
catalog | Build the structured catalogue rows |
storage_writer | Persist the catalogue |
renderer | Produce shareable formats (HTML, CSV, tree view) |
Canonical origin tag
Section titled “Canonical origin tag”extract_chatbot_origin() normalises every Boost export folder name into a canonical boost:{NAME} origin tag. Handles multiple naming conventions and legacy aliases (e.g. DNBSERVICEDESK → FIX).
from factflow_boost import extract_chatbot_origin
extract_chatbot_origin("FIX 2026-04-20.fullExport") # → "boost:FIX"Test scenario
Section titled “Test scenario”The test-cli harness includes Boost (scenario s3):
scripts/test-cli/run.sh s3Runs end-to-end against fixture data.
Downstream
Section titled “Downstream”Boost catalogues typically feed the knowledge workflow for concept detection + consolidation — though the Boost pipeline is self-contained and usable standalone.