Chat Complete
Execute RAG chat with context retrieval and LLM generation.
This endpoint retrieves relevant context from the knowledge base, builds a prompt with the context, and generates a response using the specified LLM model.
Retrieval strategies:
- merge: Search all sources in parallel, merge by weighted similarity
- sequential: Search sources in order until enough results are found
- ensemble: Use RRF to combine vector and keyword search
Args: request: Chat request with query, history, and configuration service: Chat service (injected)
Returns: ChatCompletionResponse with generated text and source attributions
Raises: HTTPException: 503 if service not configured, 500 on generation error
Request Body required
Section titled “Request Body required ”Request model for chat completion.
object
User query text
Conversation history
A message in the conversation history.
object
Message role
Message content
How to retrieve context
Configuration for RAG context retrieval.
object
Data sources to search
Configuration for a single data source in RAG retrieval.
object
Descriptive name for the source
Embedding model name
Max results from this source
Min similarity
Weight for merging results
Strategy for combining results from multiple sources
Max total context items
Configuration for attachment-based context retrieval.
object
Include full document content (vs summary/excerpts)
Embedding model for query (when no rag_config specified)
LLM model for generation
Generation temperature
Max tokens in response
Responses
Section titled “ Responses ”Successful Response
Response model for chat completion.
object
Generated response text
Source attributions for retrieved context
Source attribution for a retrieved context item.
Information about a document used in document or attachment mode.
For document mode: uses path field.
For attachment mode: uses storage_key and index fields.
The index is 1-based, matching LLM [1], [2] references in responses.
Observability metadata for RAG search operations.
object
Retrieval strategy used
Total candidates evaluated
Time spent retrieving context
Time spent generating response
Total request time
LLM model that generated the response
Validation Error