Skip to content

Chat and search

Factflow ships 12 chat endpoints and 6 search endpoints. Together they expose the indexed content (segments + embeddings + full-text) to end users as a RAG-backed chat interface or a programmatic search API.

  • Segments and embeddings produced by pipelines (typically factflow-markdownfactflow-embeddings)
  • Stored in PostgreSQL with pgvector for vector similarity + full-text search (FTS)
  • factflow-server exposes the HTTP surface; internally calls factflow-embeddings.EmbeddingService and SegmentContentRepository

Pure vector similarity. Embed the query with the same model used by the pipeline, return top-k nearest neighbours.

Terminal window
curl -X POST http://localhost:8000/api/v1/search \
-H 'Content-Type: application/json' \
-d '{"query": "how do I configure retries", "limit": 10}'

CLI:

Terminal window
factflow search semantic "how do I configure retries" --limit 10

POST /api/v1/search/hybrid — vector + FTS

Section titled “POST /api/v1/search/hybrid — vector + FTS”

Combines semantic similarity with keyword ranking. Better for queries with specific terms (product names, error codes) plus intent.

Terminal window
factflow search hybrid "retry circuit breaker configuration"

POST /api/v1/search/rrf — reciprocal rank fusion

Section titled “POST /api/v1/search/rrf — reciprocal rank fusion”

Multi-signal ranking — semantic + FTS + optionally other signals — combined via RRF. The default for the chat endpoint.

Query across multiple embedding slots (e.g., English + Norwegian models, or general + domain-specific). Returns merged results with per-model scores.

Terminal window
factflow search multi-model "transaksjon avvist" --models no-large,en-large

GET /api/v1/search/sources, GET /api/v1/search/capabilities

Section titled “GET /api/v1/search/sources, GET /api/v1/search/capabilities”

Enumerate what’s searchable (which origins are indexed, which embedding slots exist) — use these to build dynamic UIs.

Chat is stateful: every conversation is a thread with a stored history. Messages are RAG-grounded — the server retrieves relevant segments, builds a prompt, calls the LLM, and returns the response plus the sources it used.

POST /api/v1/chat/threads — start a thread

Section titled “POST /api/v1/chat/threads — start a thread”
Terminal window
curl -X POST http://localhost:8000/api/v1/chat/threads \
-H 'Content-Type: application/json' \
-d '{"title": "Onboarding questions"}'

Returns thread_id.

POST /api/v1/chat/threads/{id}/messages — ask a question

Section titled “POST /api/v1/chat/threads/{id}/messages — ask a question”
Terminal window
curl -X POST http://localhost:8000/api/v1/chat/threads/THREAD_ID/messages \
-H 'Content-Type: application/json' \
-d '{"content": "How do I set up a replay for a failed execution?"}'

Response streams the LLM output (SSE by default, or JSON if Accept: application/json) plus the source segments used.

List threads (optionally filter by user).

Full conversation history.

GET /api/v1/chat/capabilities + GET /api/v1/chat/sources

Section titled “GET /api/v1/chat/capabilities + GET /api/v1/chat/sources”

Same purpose as the search endpoints — tell a UI what’s available.

Every chat response includes a sources array referencing the segments that informed the answer. Each source carries:

  • segment_id — primary key
  • content — the excerpt
  • metadata — origin, URL, title, page
  • score — retrieval rank

Frontends use this to build “cited sources” UIs with links back to the original content.

Chat threads are owned by a user id. The server pulls this from the auth context (typically via DNB SSO headers forwarded by the ingress). Unauthenticated requests either fail or create anonymous threads — depends on deployment config.