LLM Integrationยถ
Enterprise-grade LLM integration with reliability patterns
Connect to 18+ LLM providers with intelligent fallbacks and monitoring across 400+ modules
Enterprise ML/AI
Part of 237 enterprise modules with 14 ML/AI infrastructure features including model gateway, prompt management, and A/B testing. See Enterprise Documentation.
Quick Navigationยถ
-
Supported Models
OpenAI, Anthropic, Azure, and more
-
Reliability
Circuit breakers and retry logic
-
Caching
Response caching for performance
-
Examples
Integration patterns
Overviewยถ
The llms module provides enterprise-grade integration with Large Language Models (LLMs) featuring reliability patterns, monitoring, and intelligent fallback mechanisms. It abstracts away the complexity of model management while providing advanced features like circuit breakers, retry logic, response caching, and performance tracking.
Enterprise ML/AI
The framework includes 14 ML/AI infrastructure modules with LLM gateway, prompt management, model versioning, and A/B testing.
High-Level Design (HLD)ยถ
LLM Integration Architectureยถ
graph TB
subgraph "Application Layer"
APP[Agent Application]
API[LLM Manager API]
end
subgraph "LLM Manager Core"
MGR[LLM Manager]
CACHE[Response Cache]
FALLBACK[Fallback Controller]
METRICS[Metrics Collector]
end
subgraph "Reliability Layer"
CB1[Circuit Breaker 1]
CB2[Circuit Breaker 2]
CB3[Circuit Breaker N]
RETRY[Retry Logic]
end
subgraph "Model Registry"
M1["GPT-4<br/>Primary"]
M2["Claude<br/>Fallback 1"]
M3["GPT-3.5<br/>Fallback 2"]
end
subgraph "External LLM Providers"
OPENAI[OpenAI API]
ANTHROPIC[Anthropic API]
AZURE[Azure OpenAI]
CUSTOM[Custom Models]
end
APP --> API
API --> MGR
MGR --> CACHE
CACHE -."miss".-> FALLBACK
FALLBACK --> CB1 & CB2 & CB3
CB1 --> RETRY
CB2 --> RETRY
CB3 --> RETRY
RETRY --> M1 & M2 & M3
M1 --> OPENAI
M2 --> ANTHROPIC
M3 --> AZURE
MGR --> METRICS
CB1 & CB2 & CB3 -."status".-> METRICS
style CACHE fill:#4caf50
style CB1 fill:#ff9800
style CB2 fill:#ff9800
style CB3 fill:#ff9800
style M1 fill:#2196f3
Key Classesยถ
LLMManagerยถ
Enhanced LLM Manager with reliability and monitoring features.
Constructor:
| Python | |
|---|---|
Methods:
register_model(name, inference_fn, metadata=None)- Register an LLM modelset_active_model(name)- Set the active modelset_fallback_chain(model_names)- Configure fallback chain for model failuresgenerate(prompt, use_cache=True, **kwargs)- Generate response with retry and fallbackclear_cache()- Clear response cacheget_model_info(model_name)- Get model information and statisticsget_metrics()- Get overall performance metricsreset_circuit_breaker(model_name)- Manually reset circuit breakerlist_models()- List all registered models
CircuitBreakerยถ
Circuit breaker pattern to prevent cascading failures.
Constructor:
| Python | |
|---|---|
States: - closed - Normal operation - open - Circuit tripped, blocking calls - half-open - Testing if service recovered
Methods: - call(func, *args, **kwargs) - Execute function with circuit breaker protection
Circuit Breaker State Machineยถ
stateDiagram-v2
[*] --> Closed: Initialize
Closed --> Open: Failure Threshold Reached<br/>(5 consecutive failures)
Closed --> Closed: Success (Reset failure count)
Closed --> Closed: Failure (Increment count)
Open --> HalfOpen: Recovery Timeout Expired<br/>(60 seconds)
Open --> Open: Request Blocked<br/>(Fail Fast)
HalfOpen --> Closed: Success<br/>(Service Recovered)
HalfOpen --> Open: Failure<br/>(Still Broken)
HalfOpen --> HalfOpen: Testing...
note right of Closed
Normal operation
All requests pass through
Failure count tracked
end note
note right of Open
Circuit tripped
Requests fail immediately
Prevents cascading failures
end note
note right of HalfOpen
Testing recovery
Limited requests allowed
Determines if service is healthy
end note
Low-Level Design (LLD)ยถ
LLM Manager Class Diagramยถ
classDiagram
class LLMManager {
-Dict~str,ModelInfo~ models
-str active_model
-List~str~ fallback_chain
-int max_retries
-bool enable_caching
-ResponseCache cache
-MetricsCollector metrics
+__init__(max_retries, enable_caching)
+register_model(name, inference_fn, metadata) void
+set_active_model(name) void
+set_fallback_chain(model_names) void
+generate(prompt, use_cache, kwargs) str
+clear_cache() void
+get_model_info(model_name) Dict
+get_metrics() Dict
+reset_circuit_breaker(model_name) void
+list_models() List~str~
-_generate_with_fallback(prompt, kwargs) str
-_generate_with_retry(model, prompt, kwargs) str
}
class ModelInfo {
+str name
+Callable inference_fn
+Dict metadata
+CircuitBreaker circuit_breaker
+int success_count
+int failure_count
+float avg_latency
+datetime last_used
}
class CircuitBreaker {
-int failure_threshold
-int recovery_timeout
-str state
-int failure_count
-datetime last_failure_time
+call(func, args, kwargs) Any
+reset() void
+get_state() str
-_should_attempt() bool
-_record_success() void
-_record_failure() void
-_transition_state(new_state) void
}
class ResponseCache {
-Dict~str,CacheEntry~ cache
-int max_size
-str eviction_policy
+get(key) Any
+set(key, value, ttl) void
+clear() void
+size() int
-_generate_cache_key(prompt, kwargs) str
-_evict_if_needed() void
}
class MetricsCollector {
-Dict metrics
-List~Event~ events
+record_request(model, latency) void
+record_cache_hit() void
+record_cache_miss() void
+record_fallback(from_model, to_model) void
+get_metrics() Dict
}
LLMManager "1" *-- "*" ModelInfo: manages
ModelInfo "1" *-- "1" CircuitBreaker: has
LLMManager "1" *-- "1" ResponseCache: uses
LLMManager "1" *-- "1" MetricsCollector: uses Request Flow with Fallbackยถ
sequenceDiagram
participant App
participant LLMMgr as LLM Manager
participant Cache
participant CB1 as Circuit Breaker<br/>(GPT-4)
participant GPT4 as GPT-4 API
participant CB2 as Circuit Breaker<br/>(Claude)
participant Claude as Claude API
participant Metrics
App->>LLMMgr: generate(prompt)
activate LLMMgr
LLMMgr->>Cache: check(prompt)
alt Cache Hit
Cache-->>LLMMgr: cached_response
LLMMgr->>Metrics: record_cache_hit()
LLMMgr-->>App: return response
else Cache Miss
Cache-->>LLMMgr: None
LLMMgr->>Metrics: record_cache_miss()
LLMMgr->>CB1: call(gpt4_fn, prompt)
activate CB1
alt Circuit Closed
CB1->>GPT4: generate(prompt)
alt Success
GPT4-->>CB1: response
CB1->>CB1: record_success()
CB1-->>LLMMgr: response
LLMMgr->>Cache: store(prompt, response)
LLMMgr->>Metrics: record_success("gpt-4")
LLMMgr-->>App: return response
else API Error
GPT4-->>CB1: Error
CB1->>CB1: record_failure()
CB1-->>LLMMgr: CircuitBreakerError
deactivate CB1
Note over LLMMgr: Fallback to Claude
LLMMgr->>Metrics: record_fallback("gpt-4", "claude")
LLMMgr->>CB2: call(claude_fn, prompt)
activate CB2
CB2->>Claude: generate(prompt)
Claude-->>CB2: response
CB2->>CB2: record_success()
CB2-->>LLMMgr: response
deactivate CB2
LLMMgr->>Cache: store(prompt, response)
LLMMgr->>Metrics: record_success("claude")
LLMMgr-->>App: return response
end
else Circuit Open
CB1-->>LLMMgr: CircuitOpenError
deactivate CB1
Note over LLMMgr: Immediate fallback
LLMMgr->>CB2: call(claude_fn, prompt)
CB2->>Claude: generate(prompt)
Claude-->>CB2: response
CB2-->>LLMMgr: response
LLMMgr-->>App: return response
end
end
deactivate LLMMgr Basic Usageยถ
Registering and Using Modelsยถ
Configuring Fallback Chainยถ
| Python | |
|---|---|
Response Cachingยถ
Retry Logic with Exponential Backoffยถ
flowchart TD
START([Request]) --> ATTEMPT[Attempt = 0]
ATTEMPT --> CALL[Call LLM API]
CALL --> SUCCESS{Success?}
SUCCESS -->|Yes| RETURN[Return Response]
SUCCESS -->|No| CHECK{Attempt <<br/>Max Retries?}
CHECK -->|No| INC[Attempt++]
INC --> BACKOFF[Calculate Backoff<br/>2^attempt * 1000ms]
BACKOFF --> WAIT[Wait]
WAIT --> TRANSIENT{Transient<br/>Error?}
TRANSIENT -->|Yes| CALL
TRANSIENT -->|No| FAIL
CHECK -->|Yes| FALLBACK{Fallback<br/>Available?}
FALLBACK -->|Yes| NEXT[Try Next Model]
FALLBACK -->|No| FAIL[Raise Error]
NEXT --> ATTEMPT
RETURN --> END([Complete])
FAIL --> END
style START fill:#4caf50
style RETURN fill:#4caf50
style FAIL fill:#f44336
style CALL fill:#2196f3
style WAIT fill:#ff9800 Advanced Featuresยถ
Circuit Breaker Patternยถ
The circuit breaker prevents cascading failures by temporarily blocking calls to failing models:
Performance Monitoringยถ
Retry with Exponential Backoffยถ
Automatic retry with exponential backoff is built-in:
| Python | |
|---|---|
Integration with Agentsยถ
Real-World Exampleยถ
Multi-Model System with Fallbackยถ
Best Practicesยถ
1. Configure Fallback Chainsยถ
Always configure fallback models for production systems:
2. Enable Caching for Repeated Queriesยถ
| Python | |
|---|---|
3. Monitor Performance Metricsยถ
| Python | |
|---|---|
4. Handle Circuit Breaker Statesยถ
| Python | |
|---|---|
5. Secure API Keysยถ
| Python | |
|---|---|
Use Casesยถ
- Conversational AI agents - Multi-turn conversations with context
- Content generation - Articles, summaries, creative writing
- Code generation - Automated code creation and completion
- Data analysis - Natural language queries on structured data
- Document processing - Extraction, summarization, translation
- Customer support - Automated response generation
Related Documentationยถ
- Agents Module - Agent system integration
- Prompts Module - Prompt management and rendering
- Security Module - Securing LLM interactions
- Monitoring Module - Performance tracking
- Best Practices - Production recommendations