Skip to content

Chat Stream

POST
/api/v1/chat/stream

Stream RAG chat response using Server-Sent Events.

The stream emits events in this order:

  1. sources event with retrieved context attributions
  2. Multiple text events with generated content chunks
  3. done event when generation is complete

Event format:

event: sources
data: {"sources": [...]}

event: text
data: {"content": "..."}

event: done
data: {"completed": true}

Args: request: Chat request with query, history, and configuration service: Chat service (injected)

Returns: StreamingResponse with SSE events

Raises: HTTPException: 503 if service not configured

ChatCompletionRequest

Request model for chat completion.

object
query
required
Query

User query text

string
>= 1 characters <= 10000 characters
history
History

Conversation history

Array<object>
<= 50 items
ChatMessageModel

A message in the conversation history.

object
role
required
Role

Message role

string
Allowed values: user assistant system
content
required
Content

Message content

string
>= 1 characters
context_mode
Context Mode

How to retrieve context

string
default: rag
Allowed values: rag attachment hybrid document none
rag_config
Any of:
RAGConfigModel

Configuration for RAG context retrieval.

object
sources
Sources

Data sources to search

Array<object>
DataSourceConfigModel

Configuration for a single data source in RAG retrieval.

object
name
required
Name

Descriptive name for the source

string
model_name
required
Model Name

Embedding model name

string
limit
Limit

Max results from this source

integer
default: 5 >= 1 <= 50
similarity_threshold
Similarity Threshold

Min similarity

number
default: 0.7 <= 1
weight
Weight

Weight for merging results

number
default: 1 <= 2
metadata_filters
Any of:
object
key
additional properties
any
retrieval_strategy
Retrieval Strategy

Strategy for combining results from multiple sources

string
default: merge
Allowed values: merge sequential ensemble
max_total_results
Max Total Results

Max total context items

integer
default: 10 >= 1 <= 50
attachment_config
Any of:
AttachmentConfigModel

Configuration for attachment-based context retrieval.

object
attachments
Attachments

Explicit document attachments (max 20)

Array<object>
<= 20 items
AttachmentModel

A document attachment for explicit context selection.

object
storage_key
required
Storage Key

Storage path to the document

string
>= 1 characters
title
Any of:
string
max_tokens
Any of:
integer
>= 1
include_full_content
Include Full Content

Include full document content (vs summary/excerpts)

boolean
default: true
document_paths
Any of:
Array<string>
<= 20 items
embedding_model
Embedding Model

Embedding model for query (when no rag_config specified)

string
default: openai-small
llm_model
Llm Model

LLM model for generation

string
default: gpt-4o-mini
system_prompt
Any of:
string
temperature
Temperature

Generation temperature

number
default: 0.7 <= 2
max_tokens
Max Tokens

Max tokens in response

integer
default: 1024 >= 1 <= 8192
conversation_id
Any of:
string

Successful Response

Validation Error

HTTPValidationError
object
detail
Detail
Array<object>
ValidationError
object
loc
required
Location
Array
msg
required
Message
string
type
required
Error Type
string
input
Input
ctx
Context
object