LLM clients

Every adapter that calls a language model talks to an LLMClientProtocol or EmbeddingClientProtocol. The concrete implementation is chosen at construction time by factflow-llm based on config. Adapter authors never import a specific provider.

Why abstract?

Three practical reasons:

Cost optimisation — swap from Claude Opus to Claude Sonnet via a config change, no redeploy
Vendor redundancy — if OpenAI is down, route to Bedrock without touching code
Testability — production code depends on a protocol; tests inject a mock satisfying the same protocol

Supported providers

Provider	Chat	Embeddings	Notes
OpenAI	✓	✓	`OpenAIClient`, `AzureOpenAIClient`
Anthropic	✓	—	`AnthropicClient`
Bedrock	—	✓	`BedrockEmbeddingClient` via Titan
HuggingFace	—	✓	`HuggingFaceEmbeddingClient` via sentence-transformers, local

Adding a provider is one file implementing LLMClientProtocol / EmbeddingClientProtocol plus registration in the factory.

Optional dependencies are gated by _AVAILABLE flags (ANTHROPIC_AVAILABLE, BEDROCK_AVAILABLE, SENTENCE_TRANSFORMERS_AVAILABLE). The factory skips providers whose libraries aren’t installed.

The factory

from factflow_llm import LLMClientFactory
from factflow_llm.settings import LLMConfig

config: LLMConfig = ...  # loaded from app config
factory = LLMClientFactory(config)

chat = factory.create_completion_client(provider_name="default")
emb  = factory.create_embedding_client(provider_name="openai-embed")

response = await chat.complete(messages=[...])
vectors  = await emb.embed(texts=[...])

Clients are cached per provider profile. First call constructs; subsequent calls reuse.

Adaptive rate limiting

Production LLM providers enforce rate limits. Hitting the limit produces 429 / RateLimitError; exceeding it sustained causes account degradation.

Factflow wraps every client in RateLimitedClient, which uses AIMD (additive increase / multiplicative decrease):

Additive increase — on sustained success, gradually raise the per-second token + request budget
Multiplicative decrease — on a rate-limit signal, halve the budget and back off
Recovery — after cooldown, start probing again

Effect: throughput climbs until you hit the ceiling, backs off cleanly, finds equilibrium, and adapts automatically when the provider’s rate limit changes without notice.

Configuration: RateLimitConfig on each provider profile. Sensible defaults ship; tune only if your workload is unusual.

Error classification

Every provider exception is classified:

Classification	Meaning	Caller behaviour
`RETRYABLE`	Transient network or server issue	Retry with backoff
`TERMINAL`	Bad request, invalid prompt, auth failure	Don’t retry; propagate
`RATE_LIMITED`	429 or provider backoff signal	Back off per the rate limiter

Adapter authors rarely implement custom classification — classify_llm_error(exc) does the right thing across providers.

Model selection

There is no global “default model”. Each pipeline’s config specifies what it wants:

- type: "llm_translator"
  config:
    provider: "default"              # points at the provider profile
    model: "claude-sonnet-4-6"       # explicit model id
    max_tokens: 4096

A pipeline wanting GPT-4o and a pipeline wanting Claude Opus coexist without conflict.

Adding a new provider

Implement LLMClientProtocol (and/or EmbeddingClientProtocol) in a new file under factflow-llm/src/factflow_llm/
Add a discovery flag (MYPROVIDER_AVAILABLE = try_import("myprovider"))
Register in LLMClientFactory._create_client_for_provider
Add provider-specific settings to LLMProviderConfig (or extend via the extra dict pattern)
Add a test that constructs the client without credentials — should fail fast, not hang

factflow-llm reference — every public export
factflow-protocols reference — the abstract contracts
LLM configuration guide — env vars, config YAML, verification