Chat Stream

POST

/api/v1/chat/stream

Stream RAG chat response using Server-Sent Events.

The stream emits events in this order:

sources event with retrieved context attributions
Multiple text events with generated content chunks
done event when generation is complete

Event format:

event: sources
data: {"sources": [...]}

event: text
data: {"content": "..."}

event: done
data: {"completed": true}

Args: request: Chat request with query, history, and configuration service: Chat service (injected)

Returns: StreamingResponse with SSE events

Raises: HTTPException: 503 if service not configured

Request Body^required

ChatCompletionRequest

Request model for chat completion.

object

query

required

Query

User query text

string

>= 1 characters <= 10000 characters

history

History

Conversation history

Array<object>

<= 50 items

ChatMessageModel

A message in the conversation history.

object

role

required

Role

Message role

string

Allowed values: user assistant system

content

required

Content

Message content

string

>= 1 characters

context_mode

Context Mode

How to retrieve context

string

default: rag

Allowed values: rag attachment hybrid document none

rag_config

Any of:

RAGConfigModel
null

RAGConfigModel

Configuration for RAG context retrieval.

object

sources

Sources

Data sources to search

Array<object>

DataSourceConfigModel

Configuration for a single data source in RAG retrieval.

object

name

required

Name

Descriptive name for the source

string

model_name

required

Model Name

Embedding model name

string

limit

Limit

Max results from this source

integer

default: 5 >= 1 <= 50

similarity_threshold

Similarity Threshold

Min similarity

number

default: 0.7 <= 1

weight

Weight

Weight for merging results

number

default: 1 <= 2

metadata_filters

Any of:

object
null

object

key

additional properties

any

retrieval_strategy

Retrieval Strategy

Strategy for combining results from multiple sources

string

default: merge

Allowed values: merge sequential ensemble

max_total_results

Max Total Results

Max total context items

integer

default: 10 >= 1 <= 50

attachment_config

Any of:

AttachmentConfigModel
null

AttachmentConfigModel

Configuration for attachment-based context retrieval.

object

attachments

Attachments

Explicit document attachments (max 20)

Array<object>

<= 20 items

AttachmentModel

A document attachment for explicit context selection.

object

storage_key

required

Storage Key

Storage path to the document

string

>= 1 characters

title

Any of:

string
null

string

max_tokens

Any of:

integer
null

integer

>= 1

include_full_content

Include Full Content

Include full document content (vs summary/excerpts)

boolean

default: true

document_paths

Any of:

Array<string>
null

Array<string>

<= 20 items

embedding_model

Embedding Model

Embedding model for query (when no rag_config specified)

string

default: openai-small

llm_model

Llm Model

LLM model for generation

string

default: gpt-4o-mini

system_prompt

Any of:

string
null

string

temperature

Temperature

Generation temperature

number

default: 0.7 <= 2

max_tokens

Max Tokens

Max tokens in response

integer

default: 1024 >= 1 <= 8192

conversation_id

Any of:

string
null

string

Responses

200

Successful Response

422

Validation Error

HTTPValidationError

object

detail

Detail

Array<object>

ValidationError

object

loc

required

Location

Array

msg

required

Message

string

type

required

Error Type

string

input

Input

ctx

Context

object

Chat Stream

Request Body required

Responses

200

422

Request Body^required