factflow-llm

LLM and embedding client infrastructure — provider factory, adaptive rate limiting, and error classification. Used by every adapter that calls a language model.

Tier and role

Tier: shared service
Import name: factflow_llm
Source: backend/packages/factflow-llm/

Adapter authors construct clients via LLMClientFactory, receive a LLMClientProtocol or EmbeddingClientProtocol instance, and call it. The adapter doesn’t know which provider backs the call — that’s the factory’s decision based on config.

Context

Five concrete providers wired behind one factory:

OpenAI (chat + embeddings)
Azure OpenAI (chat + embeddings, OpenAI-compatible deployment)
Anthropic (chat)
Bedrock (embeddings via Titan)
HuggingFace (embeddings via sentence-transformers, local)

The _AVAILABLE flags (ANTHROPIC_AVAILABLE, BEDROCK_AVAILABLE, SENTENCE_TRANSFORMERS_AVAILABLE) let the factory skip providers whose optional dependency isn’t installed, without failing at import time.

Rationale

Adaptive rate limiting. AdaptiveRateLimiter uses AIMD (additive increase / multiplicative decrease) to probe provider quotas rather than respecting a hardcoded RPS. Adapter throughput rises until it hits a 429, then backs off proportionally.
Error classification. classify_llm_error maps provider-specific exceptions to a taxonomy (retryable, terminal, rate_limited) so adapter retry logic doesn’t need provider-specific try/except trees.
Lazy client creation. The factory caches clients per provider profile; first call constructs, subsequent calls reuse.

Public API

Every symbol listed here is in factflow_llm.__all__ (or is the top-level LLMClientFactory).

Factory

from factflow_llm import LLMClientFactory
from factflow_llm.settings import LLMConfig

config: LLMConfig = ...
factory = LLMClientFactory(config)

chat = factory.create_completion_client(provider_name="default")
emb  = factory.create_embedding_client(provider_name="openai-embed")

Clients (direct constructors, rarely used)

from factflow_llm import (
    BaseLLMClient,
    OpenAIClient,
    AzureOpenAIClient,
    AnthropicClient,
    BedrockEmbeddingClient,
    HuggingFaceEmbeddingClient,
)

Prefer the factory. Direct construction is used only in unit tests.

Rate limiting

from factflow_llm import (
    AdaptiveRateLimiter,
    AIMDMetrics,
    RateLimitConfig,
    RateLimitSignal,
    RateLimitedClient,           # wraps a BaseLLMClient with rate-limiting
    RateLimitedEmbeddingClient,  # wraps an embedding client
)

Error classification

from factflow_llm import (
    LLMErrorClassification,    # enum: RETRYABLE / TERMINAL / RATE_LIMITED
    classify_llm_error,        # exception → classification
    get_error_metadata,        # extract retry-after, error code, etc.
    is_fatal_llm_error,        # quick boolean check
)

Message + response types

from factflow_llm import (
    Message, Role, StreamChunk,
    CompletionResponse,
    EmbeddingResponse,
)

Protocol re-exports

from factflow_llm import LLMClientProtocol, EmbeddingClientProtocol

These are the same protocols defined in factflow-protocols, re-exported for convenience.

Settings (from `factflow_llm.settings`)

from factflow_llm.settings import (
    LLMConfig,            # root config
    LLMProviderConfig,    # per-provider profile
    ModelType,            # enum: CHAT / EMBEDDING
)

Availability flags

from factflow_llm import (
    ANTHROPIC_AVAILABLE,
    BEDROCK_AVAILABLE,
    SENTENCE_TRANSFORMERS_AVAILABLE,
)

Runtime checks so the factory can skip a provider whose optional deps aren’t installed.

Dependencies

Runtime: openai>=2.26.0, anthropic>=0.84.0, tiktoken>=0.12.0. Bedrock and HuggingFace deps are optional (surface via the _AVAILABLE flags).
Workspace: factflow-protocols, factflow-foundation
External services: depends on which providers are configured; credentials injected via env / config.

Testing

Tests at backend/packages/factflow-llm/tests/. See the llm-unit-testing skill for mocking patterns — MockLLMClient is not a library class; tests use unittest.mock directly.

factflow-protocols — the LLMClientProtocol / EmbeddingClientProtocol contracts
Rule: .claude/rules/llm-conventions.md — factory, rate limiting, error classification invariants
Workflow packages that consume LLM clients — embedding adapters in factflow-embeddings, extraction / chat adapters across the rest