LiteLLM

LiteLLM is an open-source Python library and proxy server that acts as a universal API gateway for over 100 LLM providers, standardizing their diverse APIs into a single OpenAI-compatible interface.¹⁾ It enables developers and teams to switch models or providers without rewriting application code, while the proxy server adds enterprise-grade features for production deployment.

GitHub: github.com/BerriAI/litellm
Version: v1.80.5 (November 2025)
License: MIT
Maintainer: BerriAI

Overview and Architecture

LiteLLM serves as a lightweight abstraction layer that decouples application logic from underlying model provider implementations. Rather than binding applications to specific API formats or endpoints, LiteLLM normalizes requests and responses across heterogeneous model providers. This architectural approach enables developers to experiment with different models, switch providers, or maintain multi-model deployments without requiring significant code refactoring. By providing consistent parameter handling, error management, and response formatting, LiteLLM reduces the complexity of managing multiple provider integrations simultaneously.

The library supports both completion-based and chat-based model interfaces, accommodating different model architectures and interaction patterns.

Core Components

Python SDK: Simplifies direct API calls with a provider-agnostic interface:²⁾

import litellm
response = litellm.completion(
    model="gpt-4o",  # or [[claude|claude]]-3-5-sonnet, bedrock, gemini...
    messages=[{"role": "user", "content": "Hello"}],
    stream=True
)

Proxy Server (Gateway): A deployable FastAPI server providing centralized routing, API key management, logging, and rate limiting. Configured via YAML with Redis-backed rate limiting and budget controls.

Supported Providers

LiteLLM connects to 100+ LLM services including:

OpenAI (GPT-4o, GPT-5, o3, o4-mini)
Anthropic (Claude 3.5, Claude 4 family)
Google (Gemini 2.0, 3.0)
AWS Bedrock (with invoke route support)
Azure OpenAI
Ollama (local models)
Mistral, TogetherAI, Groq, DeepSeek, Nova

It normalizes schemas across providers, translating Bedrock/Anthropic formats to OpenAI-style messages and token counts.

Key Features

Load Balancing and Fallbacks: Configurable model lists auto-reroute on failures (e.g., 429 rate limits) to backup models or regions, eliminating custom retry logic.

Cost Tracking and Optimization: Unified dashboard for spend across all providers with real-time tracking, exports by organization/team/key/tags, and token counting. Organizations benefit from reduced vendor lock-in through dynamic model selection, enabling applications to route requests to the most cost-effective provider for specific tasks.

Observability: Integrates with Langfuse for tracing; supports multi-modality, trace IDs, partial error logging, and guardrails.

MCP Gateway: Tool access control by team and API key, enabling controlled MCP server access.

Admin UI: Usage views supporting 1M+ log entries, key editing/testing, cache health checks.

Security: Self-hostable for air-gapped deployments, no default telemetry, OAuth 2.0 support.

Integration with AI Frameworks

LiteLLM has been decoupled from DSPy 3.2 to support flexible model serving infrastructure independent of specific framework implementations. This separation enables the library to function as a standalone model abstraction layer while remaining compatible with various orchestration systems and deployment patterns. The decoupling allows applications built with DSPy, LangChain, and other frameworks to leverage LiteLLM's provider abstraction without importing unnecessary framework components. This modularity supports microservice architectures where different components manage model interactions independently.

Practical Applications

Teams can implement A/B testing across models, gradually migrate between providers, and maintain fallback mechanisms if primary providers experience availability issues. LiteLLM standardizes interactions, reducing complexity in managing vendor lock-in and supporting flexible, multi-model inference pipelines.

Performance

Recent releases demonstrate continuous improvement:³⁾

v1.77.7 (October 2025): 2.9x lower median latency
v1.78.5: Native OCR support
v1.79.0: Search APIs, guardrails
v1.80.0: Agent Hub for managing agents
v1.80.5: Latest release with enhanced stability

References

¹⁾ , ³⁾

github.com/BerriAI/litellm

²⁾

docs.litellm.ai

AI Agent Knowledge Base

Sidebar

Table of Contents

LiteLLM

Overview and Architecture

Core Components

Supported Providers

Key Features

Integration with AI Frameworks

Practical Applications

Performance

See Also

References

AI Agent Knowledge Base

User Tools

Site Tools

Sidebar

Table of Contents

LiteLLM

Overview and Architecture

Core Components

Supported Providers

Key Features

Integration with AI Frameworks

Practical Applications

Performance

See Also

References

Page Tools