Chat Complete

POST

/api/v1/chat/complete

Execute RAG chat with context retrieval and LLM generation.

This endpoint retrieves relevant context from the knowledge base, builds a prompt with the context, and generates a response using the specified LLM model.

Retrieval strategies:

merge: Search all sources in parallel, merge by weighted similarity
sequential: Search sources in order until enough results are found
ensemble: Use RRF to combine vector and keyword search

Args: request: Chat request with query, history, and configuration service: Chat service (injected)

Returns: ChatCompletionResponse with generated text and source attributions

Raises: HTTPException: 503 if service not configured, 500 on generation error

Request Body^required

ChatCompletionRequest

Request model for chat completion.

object

query

required

Query

User query text

string

>= 1 characters <= 10000 characters

history

History

Conversation history

Array<object>

<= 50 items

ChatMessageModel

A message in the conversation history.

object

role

required

Role

Message role

string

Allowed values: user assistant system

content

required

Content

Message content

string

>= 1 characters

context_mode

Context Mode

How to retrieve context

string

default: rag

Allowed values: rag attachment hybrid document none

rag_config

Any of:

RAGConfigModel
null

RAGConfigModel

Configuration for RAG context retrieval.

object

sources

Sources

Data sources to search

Array<object>

DataSourceConfigModel

Configuration for a single data source in RAG retrieval.

object

name

required

Name

Descriptive name for the source

string

model_name

required

Model Name

Embedding model name

string

limit

Limit

Max results from this source

integer

default: 5 >= 1 <= 50

similarity_threshold

Similarity Threshold

Min similarity

number

default: 0.7 <= 1

weight

Weight

Weight for merging results

number

default: 1 <= 2

metadata_filters

Any of:

object
null

object

key

additional properties

any

retrieval_strategy

Retrieval Strategy

Strategy for combining results from multiple sources

string

default: merge

Allowed values: merge sequential ensemble

max_total_results

Max Total Results

Max total context items

integer

default: 10 >= 1 <= 50

attachment_config

Any of:

AttachmentConfigModel
null

AttachmentConfigModel

Configuration for attachment-based context retrieval.

object

attachments

Attachments

Explicit document attachments (max 20)

Array<object>

<= 20 items

AttachmentModel

A document attachment for explicit context selection.

object

storage_key

required

Storage Key

Storage path to the document

string

>= 1 characters

title

Any of:

string
null

string

max_tokens

Any of:

integer
null

integer

>= 1

include_full_content

Include Full Content

Include full document content (vs summary/excerpts)

boolean

default: true

document_paths

Any of:

Array<string>
null

Array<string>

<= 20 items

embedding_model

Embedding Model

Embedding model for query (when no rag_config specified)

string

default: openai-small

llm_model

Llm Model

LLM model for generation

string

default: gpt-4o-mini

system_prompt

Any of:

string
null

string

temperature

Temperature

Generation temperature

number

default: 0.7 <= 2

max_tokens

Max Tokens

Max tokens in response

integer

default: 1024 >= 1 <= 8192

conversation_id

Any of:

string
null

string

Responses

200

Successful Response

ChatCompletionResponse

Response model for chat completion.

object

response

required

Response

Generated response text

string

sources

Sources

Source attributions for retrieved context

Array<object>

SourceAttributionModel

Source attribution for a retrieved context item.

object

segment_id

required

Segment Id

string

storage_key

required

Any of:

string
null

string

similarity

required

Similarity

number

model_name

required

Model Name

string

snippet

required

Any of:

string
null

string

metadata

required

Any of:

object
null

object

key

additional properties

any

documents_used

Any of:

Array<object>
null

Array<object>

DocumentInfo

Information about a document used in document or attachment mode.

For document mode: uses path field. For attachment mode: uses storage_key and index fields. The index is 1-based, matching LLM [1], [2] references in responses.

object

path

Any of:

string
null

string

storage_key

Any of:

string
null

string

index

Any of:

integer
null

integer

>= 1

title

Any of:

string
null

string

content_length

Content Length

Content length in characters

integer

content_preview

Any of:

string
null

string

search_metadata

Any of:

SearchMetadata
null

SearchMetadata

Observability metadata for RAG search operations.

object

strategy

required

Strategy

Retrieval strategy used

string

candidates_evaluated

Candidates Evaluated

Total candidates evaluated

integer

embedding_model

Any of:

string
null

string

index_type

Any of:

string
null

string

vector_weight

Any of:

number
null

number

keyword_weight

Any of:

number
null

number

language

Any of:

string
null

string

retrieval_time_ms

required

Retrieval Time Ms

Time spent retrieving context

number

generation_time_ms

required

Generation Time Ms

Time spent generating response

number

total_time_ms

required

Total Time Ms

Total request time

number

model_used

required

Model Used

LLM model that generated the response

string

conversation_id

Any of:

string
null

string

422

Validation Error

HTTPValidationError

object

detail

Detail

Array<object>

ValidationError

object

loc

required

Location

Array

msg

required

Message

string

type

required

Error Type

string

input

Input

ctx

Context

object

Chat Complete

Request Body required

Responses

200

422

Request Body^required