LLM clients
Every adapter that calls a language model talks to an LLMClientProtocol or EmbeddingClientProtocol. The concrete implementation is chosen at construction time by factflow-llm based on config. Adapter authors never import a specific provider.
Why abstract?
Section titled “Why abstract?”Three practical reasons:
- Cost optimisation — swap from Claude Opus to Claude Sonnet via a config change, no redeploy
- Vendor redundancy — if OpenAI is down, route to Bedrock without touching code
- Testability — production code depends on a protocol; tests inject a mock satisfying the same protocol
Supported providers
Section titled “Supported providers”| Provider | Chat | Embeddings | Notes |
|---|---|---|---|
| OpenAI | ✓ | ✓ | OpenAIClient, AzureOpenAIClient |
| Anthropic | ✓ | — | AnthropicClient |
| Bedrock | — | ✓ | BedrockEmbeddingClient via Titan |
| HuggingFace | — | ✓ | HuggingFaceEmbeddingClient via sentence-transformers, local |
Adding a provider is one file implementing LLMClientProtocol / EmbeddingClientProtocol plus registration in the factory.
Optional dependencies are gated by _AVAILABLE flags (ANTHROPIC_AVAILABLE, BEDROCK_AVAILABLE, SENTENCE_TRANSFORMERS_AVAILABLE). The factory skips providers whose libraries aren’t installed.
The factory
Section titled “The factory”from factflow_llm import LLMClientFactoryfrom factflow_llm.settings import LLMConfig
config: LLMConfig = ... # loaded from app configfactory = LLMClientFactory(config)
chat = factory.create_completion_client(provider_name="default")emb = factory.create_embedding_client(provider_name="openai-embed")
response = await chat.complete(messages=[...])vectors = await emb.embed(texts=[...])Clients are cached per provider profile. First call constructs; subsequent calls reuse.
Adaptive rate limiting
Section titled “Adaptive rate limiting”Production LLM providers enforce rate limits. Hitting the limit produces 429 / RateLimitError; exceeding it sustained causes account degradation.
Factflow wraps every client in RateLimitedClient, which uses AIMD (additive increase / multiplicative decrease):
- Additive increase — on sustained success, gradually raise the per-second token + request budget
- Multiplicative decrease — on a rate-limit signal, halve the budget and back off
- Recovery — after cooldown, start probing again
Effect: throughput climbs until you hit the ceiling, backs off cleanly, finds equilibrium, and adapts automatically when the provider’s rate limit changes without notice.
Configuration: RateLimitConfig on each provider profile. Sensible defaults ship; tune only if your workload is unusual.
Error classification
Section titled “Error classification”Every provider exception is classified:
| Classification | Meaning | Caller behaviour |
|---|---|---|
RETRYABLE | Transient network or server issue | Retry with backoff |
TERMINAL | Bad request, invalid prompt, auth failure | Don’t retry; propagate |
RATE_LIMITED | 429 or provider backoff signal | Back off per the rate limiter |
Adapter authors rarely implement custom classification — classify_llm_error(exc) does the right thing across providers.
Model selection
Section titled “Model selection”There is no global “default model”. Each pipeline’s config specifies what it wants:
- type: "llm_translator" config: provider: "default" # points at the provider profile model: "claude-sonnet-4-6" # explicit model id max_tokens: 4096A pipeline wanting GPT-4o and a pipeline wanting Claude Opus coexist without conflict.
Adding a new provider
Section titled “Adding a new provider”- Implement
LLMClientProtocol(and/orEmbeddingClientProtocol) in a new file underfactflow-llm/src/factflow_llm/ - Add a discovery flag (
MYPROVIDER_AVAILABLE = try_import("myprovider")) - Register in
LLMClientFactory._create_client_for_provider - Add provider-specific settings to
LLMProviderConfig(or extend via theextradict pattern) - Add a test that constructs the client without credentials — should fail fast, not hang
Related
Section titled “Related”- factflow-llm reference — every public export
- factflow-protocols reference — the abstract contracts
- LLM configuration guide — env vars, config YAML, verification