factflow-embeddings
Vector embedding pipeline — generate embeddings for text segments, store to PostgreSQL (pgvector) or MessagePack, query via full-text or vector search.
Tier and role
Section titled “Tier and role”- Tier: workflow
- Import name:
factflow_embeddings - Source:
backend/packages/workflows/factflow-embeddings/
The EmbeddingService and storage providers are also consumed directly by the server’s chat + search endpoints (not just via pipeline adapters).
Context
Section titled “Context”Typical upstream producer: factflow-markdown emitting segments. Typical pipeline:
… → segment_publisher → embedding_generator → embedding storage
At runtime the server’s search endpoints (/api/v1/search/*) query the same EmbeddingVectorRepository and SegmentContentRepository that the pipeline writes to.
Rationale
Section titled “Rationale”- Multi-model support.
EmbeddingConfigbinds a logical embedding slot to a specific model; pipelines can write to multiple slots, and queries can target one or fan out across all. - Two storage backends.
PostgreSQLStorageProvider(pgvector) is the default;MessagePackStorageProviderexists for local experimentation without a DB. Swap via config. - Full-text + vector on the same content.
SegmentContentRepositorygives FTS over the text;EmbeddingVectorRepositorygives vector search over the embedding. The server hybrid-search endpoint combines both.
Public API
Section titled “Public API”Adapter (type: in YAML)
Section titled “Adapter (type: in YAML)”from factflow_embeddings import EmbeddingGeneratorAdapter # type: embedding_generatorConfig
Section titled “Config”from factflow_embeddings import ( EmbeddingConfig, EmbeddingGeneratorConfig, ModelConfig, EmbeddingRequest,)Models
Section titled “Models”from factflow_embeddings import ( EmbeddingVector, EmbeddingMetrics, SegmentContent, FTSResult, SearchLanguage,)Service + repositories
Section titled “Service + repositories”from factflow_embeddings import ( EmbeddingService, EmbeddingVectorRepository, SegmentContentRepository,)Storage providers
Section titled “Storage providers”from factflow_embeddings import ( EmbeddingStorageProvider, # protocol PostgreSQLStorageProvider, # default — pgvector MessagePackStorageProvider, # local dev StorageProviderFactory,)Dependencies
Section titled “Dependencies”- Runtime:
msgpack(for MessagePack provider). Embedding generation delegates tofactflow-llm. - Workspace:
factflow-protocols,factflow-foundation,factflow-engine,factflow-llm - External services: PostgreSQL 18 with pgvector extension
Testing
Section titled “Testing”Tests at backend/packages/workflows/factflow-embeddings/tests/. Database tests use Testcontainers. See .claude/skills/backend/database-testing/ for pgvector patterns.
Related
Section titled “Related”factflow-llm— provides the embedding clientfactflow-markdown— upstream source of segmentsfactflow-server— chat + search endpoints query the repositories here