Skip to content

factflow-boost

Boost.AI chatbot export processing — ingestion, deduplication, clustering, and cataloguing of Boost conversation data for downstream knowledge consolidation.

Runs as a batch pipeline against a Boost export folder. Not typically composed with other workflows — it’s a closed pipeline producing a structured catalogue.

Boost.AI exports are folder trees named by chatbot and date. extract_chatbot_origin() normalises every naming convention to a canonical boost:{NAME} origin tag. The CHATBOT_ALIASES dict maps legacy names (e.g. DNBSERVICEDESKFIX) to current conventions.

The pipeline runs through:

enumerator → filter → norwegian_filter → deduplicate → clustering → catalog → storage_writer → renderer

Each stage is a discrete adapter under factflow_boost.boost_processor. Post-processing routines (factflow_boost.boost_routines) render the catalogue to shareable formats.

  • Closed pipeline. Boost ingestion is complete — it doesn’t feed other workflow packages today. That makes it a good sandbox for new adapter patterns before promoting them elsewhere.
  • Fuzzy deduplication. rapidfuzz + datasketch MinHash identify near-duplicate conversations; clustering groups them for downstream triage.
  • Language-specific filtering. norwegian_filter uses a language detector to scope processing to Norwegian content (the primary Boost use case at DNB).

Top-level __init__.py exports:

from factflow_boost import extract_chatbot_origin, CHATBOT_ALIASES

Adapters and internal processors are not re-exported at the package root; they’re registered via the engine’s discovery mechanism from their subpackage (factflow_boost.boost_processor).

  • factflow_boost.boost_processor — the pipeline adapters (batch_processor, catalog, clustering, deduplicate, enumerator, filter, norwegian_filter, storage_writer, filtered_collector, stored_keys_fanout, …)
  • factflow_boost.boost_routines — post-pipeline rendering (renderer, parser, models)
  • Runtime: datasketch (MinHash), rapidfuzz (string similarity), scikit-learn + scipy (clustering)
  • Workspace: factflow-protocols, factflow-foundation, factflow-engine, factflow-llm
  • External services: storage provider (for the export folder + written catalogue)

Tests at backend/packages/workflows/factflow-boost/tests/. The test-cli harness includes a Boost scenario (s3) — scripts/test-cli/run.sh s3.