Skip to content

Boost workflow

Closed pipeline for processing Boost.AI export folders. Not composed with other workflows — produces a standalone catalogue.

enumerator → filter → norwegian_filter → deduplicate → clustering → catalog → storage_writer → renderer

A shipped example lives at backend/config/pipelines/dnb/boost-export.yaml. Adapt for your export folder.

StagePurpose
enumeratorWalk the export folder; emit one message per conversation file
filterDrop conversations matching exclusion rules (e.g. operator-only, system messages)
norwegian_filterLanguage-detect; scope to Norwegian content
deduplicateMinHash-based near-duplicate removal (uses datasketch)
clusteringGroup similar conversations (scikit-learn)
catalogBuild the structured catalogue rows
storage_writerPersist the catalogue
rendererProduce shareable formats (HTML, CSV, tree view)

extract_chatbot_origin() normalises every Boost export folder name into a canonical boost:{NAME} origin tag. Handles multiple naming conventions and legacy aliases (e.g. DNBSERVICEDESKFIX).

from factflow_boost import extract_chatbot_origin
extract_chatbot_origin("FIX 2026-04-20.fullExport") # → "boost:FIX"

The test-cli harness includes Boost (scenario s3):

Terminal window
scripts/test-cli/run.sh s3

Runs end-to-end against fixture data.

Boost catalogues typically feed the knowledge workflow for concept detection + consolidation — though the Boost pipeline is self-contained and usable standalone.