Skip to content

Chat Complete

POST
/api/v1/chat/complete

Execute RAG chat with context retrieval and LLM generation.

This endpoint retrieves relevant context from the knowledge base, builds a prompt with the context, and generates a response using the specified LLM model.

Retrieval strategies:

  • merge: Search all sources in parallel, merge by weighted similarity
  • sequential: Search sources in order until enough results are found
  • ensemble: Use RRF to combine vector and keyword search

Args: request: Chat request with query, history, and configuration service: Chat service (injected)

Returns: ChatCompletionResponse with generated text and source attributions

Raises: HTTPException: 503 if service not configured, 500 on generation error

ChatCompletionRequest

Request model for chat completion.

object
query
required
Query

User query text

string
>= 1 characters <= 10000 characters
history
History

Conversation history

Array<object>
<= 50 items
ChatMessageModel

A message in the conversation history.

object
role
required
Role

Message role

string
Allowed values: user assistant system
content
required
Content

Message content

string
>= 1 characters
context_mode
Context Mode

How to retrieve context

string
default: rag
Allowed values: rag attachment hybrid document none
rag_config
Any of:
RAGConfigModel

Configuration for RAG context retrieval.

object
sources
Sources

Data sources to search

Array<object>
DataSourceConfigModel

Configuration for a single data source in RAG retrieval.

object
name
required
Name

Descriptive name for the source

string
model_name
required
Model Name

Embedding model name

string
limit
Limit

Max results from this source

integer
default: 5 >= 1 <= 50
similarity_threshold
Similarity Threshold

Min similarity

number
default: 0.7 <= 1
weight
Weight

Weight for merging results

number
default: 1 <= 2
metadata_filters
Any of:
object
key
additional properties
any
retrieval_strategy
Retrieval Strategy

Strategy for combining results from multiple sources

string
default: merge
Allowed values: merge sequential ensemble
max_total_results
Max Total Results

Max total context items

integer
default: 10 >= 1 <= 50
attachment_config
Any of:
AttachmentConfigModel

Configuration for attachment-based context retrieval.

object
attachments
Attachments

Explicit document attachments (max 20)

Array<object>
<= 20 items
AttachmentModel

A document attachment for explicit context selection.

object
storage_key
required
Storage Key

Storage path to the document

string
>= 1 characters
title
Any of:
string
max_tokens
Any of:
integer
>= 1
include_full_content
Include Full Content

Include full document content (vs summary/excerpts)

boolean
default: true
document_paths
Any of:
Array<string>
<= 20 items
embedding_model
Embedding Model

Embedding model for query (when no rag_config specified)

string
default: openai-small
llm_model
Llm Model

LLM model for generation

string
default: gpt-4o-mini
system_prompt
Any of:
string
temperature
Temperature

Generation temperature

number
default: 0.7 <= 2
max_tokens
Max Tokens

Max tokens in response

integer
default: 1024 >= 1 <= 8192
conversation_id
Any of:
string

Successful Response

ChatCompletionResponse

Response model for chat completion.

object
response
required
Response

Generated response text

string
sources
Sources

Source attributions for retrieved context

Array<object>
SourceAttributionModel

Source attribution for a retrieved context item.

object
segment_id
required
Segment Id
string
storage_key
required
Any of:
string
similarity
required
Similarity
number
model_name
required
Model Name
string
snippet
required
Any of:
string
metadata
required
Any of:
object
key
additional properties
any
documents_used
Any of:
Array<object>
DocumentInfo

Information about a document used in document or attachment mode.

For document mode: uses path field. For attachment mode: uses storage_key and index fields. The index is 1-based, matching LLM [1], [2] references in responses.

object
path
Any of:
string
storage_key
Any of:
string
index
Any of:
integer
>= 1
title
Any of:
string
content_length
Content Length

Content length in characters

integer
0
content_preview
Any of:
string
search_metadata
Any of:
SearchMetadata

Observability metadata for RAG search operations.

object
strategy
required
Strategy

Retrieval strategy used

string
candidates_evaluated
Candidates Evaluated

Total candidates evaluated

integer
0
embedding_model
Any of:
string
index_type
Any of:
string
vector_weight
Any of:
number
keyword_weight
Any of:
number
language
Any of:
string
retrieval_time_ms
required
Retrieval Time Ms

Time spent retrieving context

number
generation_time_ms
required
Generation Time Ms

Time spent generating response

number
total_time_ms
required
Total Time Ms

Total request time

number
model_used
required
Model Used

LLM model that generated the response

string
conversation_id
Any of:
string

Validation Error

HTTPValidationError
object
detail
Detail
Array<object>
ValidationError
object
loc
required
Location
Array
msg
required
Message
string
type
required
Error Type
string
input
Input
ctx
Context
object