Skip to content

factflow-llm

LLM and embedding client infrastructure — provider factory, adaptive rate limiting, and error classification. Used by every adapter that calls a language model.

Adapter authors construct clients via LLMClientFactory, receive a LLMClientProtocol or EmbeddingClientProtocol instance, and call it. The adapter doesn’t know which provider backs the call — that’s the factory’s decision based on config.

Five concrete providers wired behind one factory:

  • OpenAI (chat + embeddings)
  • Azure OpenAI (chat + embeddings, OpenAI-compatible deployment)
  • Anthropic (chat)
  • Bedrock (embeddings via Titan)
  • HuggingFace (embeddings via sentence-transformers, local)

The _AVAILABLE flags (ANTHROPIC_AVAILABLE, BEDROCK_AVAILABLE, SENTENCE_TRANSFORMERS_AVAILABLE) let the factory skip providers whose optional dependency isn’t installed, without failing at import time.

  • Adaptive rate limiting. AdaptiveRateLimiter uses AIMD (additive increase / multiplicative decrease) to probe provider quotas rather than respecting a hardcoded RPS. Adapter throughput rises until it hits a 429, then backs off proportionally.
  • Error classification. classify_llm_error maps provider-specific exceptions to a taxonomy (retryable, terminal, rate_limited) so adapter retry logic doesn’t need provider-specific try/except trees.
  • Lazy client creation. The factory caches clients per provider profile; first call constructs, subsequent calls reuse.

Every symbol listed here is in factflow_llm.__all__ (or is the top-level LLMClientFactory).

from factflow_llm import LLMClientFactory
from factflow_llm.settings import LLMConfig
config: LLMConfig = ...
factory = LLMClientFactory(config)
chat = factory.create_completion_client(provider_name="default")
emb = factory.create_embedding_client(provider_name="openai-embed")

Clients (direct constructors, rarely used)

Section titled “Clients (direct constructors, rarely used)”
from factflow_llm import (
BaseLLMClient,
OpenAIClient,
AzureOpenAIClient,
AnthropicClient,
BedrockEmbeddingClient,
HuggingFaceEmbeddingClient,
)

Prefer the factory. Direct construction is used only in unit tests.

from factflow_llm import (
AdaptiveRateLimiter,
AIMDMetrics,
RateLimitConfig,
RateLimitSignal,
RateLimitedClient, # wraps a BaseLLMClient with rate-limiting
RateLimitedEmbeddingClient, # wraps an embedding client
)
from factflow_llm import (
LLMErrorClassification, # enum: RETRYABLE / TERMINAL / RATE_LIMITED
classify_llm_error, # exception → classification
get_error_metadata, # extract retry-after, error code, etc.
is_fatal_llm_error, # quick boolean check
)
from factflow_llm import (
Message, Role, StreamChunk,
CompletionResponse,
EmbeddingResponse,
)
from factflow_llm import LLMClientProtocol, EmbeddingClientProtocol

These are the same protocols defined in factflow-protocols, re-exported for convenience.

from factflow_llm.settings import (
LLMConfig, # root config
LLMProviderConfig, # per-provider profile
ModelType, # enum: CHAT / EMBEDDING
)
from factflow_llm import (
ANTHROPIC_AVAILABLE,
BEDROCK_AVAILABLE,
SENTENCE_TRANSFORMERS_AVAILABLE,
)

Runtime checks so the factory can skip a provider whose optional deps aren’t installed.

  • Runtime: openai>=2.26.0, anthropic>=0.84.0, tiktoken>=0.12.0. Bedrock and HuggingFace deps are optional (surface via the _AVAILABLE flags).
  • Workspace: factflow-protocols, factflow-foundation
  • External services: depends on which providers are configured; credentials injected via env / config.

Tests at backend/packages/factflow-llm/tests/. See the llm-unit-testing skill for mocking patterns — MockLLMClient is not a library class; tests use unittest.mock directly.

  • factflow-protocols — the LLMClientProtocol / EmbeddingClientProtocol contracts
  • Rule: .claude/rules/llm-conventions.md — factory, rate limiting, error classification invariants
  • Workflow packages that consume LLM clients — embedding adapters in factflow-embeddings, extraction / chat adapters across the rest