Skip to content

factflow-embeddings

Vector embedding pipeline — generate embeddings for text segments, store to PostgreSQL (pgvector) or MessagePack, query via full-text or vector search.

The EmbeddingService and storage providers are also consumed directly by the server’s chat + search endpoints (not just via pipeline adapters).

Typical upstream producer: factflow-markdown emitting segments. Typical pipeline:

… → segment_publisher → embedding_generator → embedding storage

At runtime the server’s search endpoints (/api/v1/search/*) query the same EmbeddingVectorRepository and SegmentContentRepository that the pipeline writes to.

  • Multi-model support. EmbeddingConfig binds a logical embedding slot to a specific model; pipelines can write to multiple slots, and queries can target one or fan out across all.
  • Two storage backends. PostgreSQLStorageProvider (pgvector) is the default; MessagePackStorageProvider exists for local experimentation without a DB. Swap via config.
  • Full-text + vector on the same content. SegmentContentRepository gives FTS over the text; EmbeddingVectorRepository gives vector search over the embedding. The server hybrid-search endpoint combines both.
from factflow_embeddings import EmbeddingGeneratorAdapter # type: embedding_generator
from factflow_embeddings import (
EmbeddingConfig,
EmbeddingGeneratorConfig,
ModelConfig,
EmbeddingRequest,
)
from factflow_embeddings import (
EmbeddingVector,
EmbeddingMetrics,
SegmentContent,
FTSResult,
SearchLanguage,
)
from factflow_embeddings import (
EmbeddingService,
EmbeddingVectorRepository,
SegmentContentRepository,
)
from factflow_embeddings import (
EmbeddingStorageProvider, # protocol
PostgreSQLStorageProvider, # default — pgvector
MessagePackStorageProvider, # local dev
StorageProviderFactory,
)
  • Runtime: msgpack (for MessagePack provider). Embedding generation delegates to factflow-llm.
  • Workspace: factflow-protocols, factflow-foundation, factflow-engine, factflow-llm
  • External services: PostgreSQL 18 with pgvector extension

Tests at backend/packages/workflows/factflow-embeddings/tests/. Database tests use Testcontainers. See .claude/skills/backend/database-testing/ for pgvector patterns.