Boost workflow

Closed pipeline for processing Boost.AI export folders. Not composed with other workflows — produces a standalone catalogue.

Canonical pipeline

enumerator → filter → norwegian_filter → deduplicate → clustering → catalog → storage_writer → renderer

Minimal pipeline

A shipped example lives at backend/config/pipelines/dnb/boost-export.yaml. Adapt for your export folder.

What the chain does

Stage	Purpose
`enumerator`	Walk the export folder; emit one message per conversation file
`filter`	Drop conversations matching exclusion rules (e.g. operator-only, system messages)
`norwegian_filter`	Language-detect; scope to Norwegian content
`deduplicate`	MinHash-based near-duplicate removal (uses `datasketch`)
`clustering`	Group similar conversations (scikit-learn)
`catalog`	Build the structured catalogue rows
`storage_writer`	Persist the catalogue
`renderer`	Produce shareable formats (HTML, CSV, tree view)

Canonical origin tag

extract_chatbot_origin() normalises every Boost export folder name into a canonical boost:{NAME} origin tag. Handles multiple naming conventions and legacy aliases (e.g. DNBSERVICEDESK → FIX).

from factflow_boost import extract_chatbot_origin

extract_chatbot_origin("FIX 2026-04-20.fullExport")  # → "boost:FIX"

Test scenario

The test-cli harness includes Boost (scenario s3):

scripts/test-cli/run.sh s3

Runs end-to-end against fixture data.

Downstream

Boost catalogues typically feed the knowledge workflow for concept detection + consolidation — though the Boost pipeline is self-contained and usable standalone.