Browse
Core Concepts
Reasoning
Memory & Retrieval
Agent Types
Design Patterns
Training & Alignment
Frameworks
Tools
Safety
Meta
Browse
Core Concepts
Reasoning
Memory & Retrieval
Agent Types
Design Patterns
Training & Alignment
Frameworks
Tools
Safety
Meta
LiteLLM is an open-source Python library and proxy server that acts as a universal API gateway for over 100 LLM providers, standardizing their diverse APIs into a single OpenAI-compatible interface.1) It enables developers and teams to switch models or providers without rewriting application code, while the proxy server adds enterprise-grade features for production deployment.
LiteLLM serves as a lightweight abstraction layer that decouples application logic from underlying model provider implementations. Rather than binding applications to specific API formats or endpoints, LiteLLM normalizes requests and responses across heterogeneous model providers. This architectural approach enables developers to experiment with different models, switch providers, or maintain multi-model deployments without requiring significant code refactoring. By providing consistent parameter handling, error management, and response formatting, LiteLLM reduces the complexity of managing multiple provider integrations simultaneously.
The library supports both completion-based and chat-based model interfaces, accommodating different model architectures and interaction patterns.
Python SDK: Simplifies direct API calls with a provider-agnostic interface:2)
import litellm response = litellm.completion( model="gpt-4o", # or [[claude|claude]]-3-5-sonnet, bedrock, gemini... messages=[{"role": "user", "content": "Hello"}], stream=True )
Proxy Server (Gateway): A deployable FastAPI server providing centralized routing, API key management, logging, and rate limiting. Configured via YAML with Redis-backed rate limiting and budget controls.
LiteLLM connects to 100+ LLM services including:
It normalizes schemas across providers, translating Bedrock/Anthropic formats to OpenAI-style messages and token counts.
Load Balancing and Fallbacks: Configurable model lists auto-reroute on failures (e.g., 429 rate limits) to backup models or regions, eliminating custom retry logic.
Cost Tracking and Optimization: Unified dashboard for spend across all providers with real-time tracking, exports by organization/team/key/tags, and token counting. Organizations benefit from reduced vendor lock-in through dynamic model selection, enabling applications to route requests to the most cost-effective provider for specific tasks.
Observability: Integrates with Langfuse for tracing; supports multi-modality, trace IDs, partial error logging, and guardrails.
MCP Gateway: Tool access control by team and API key, enabling controlled MCP server access.
Admin UI: Usage views supporting 1M+ log entries, key editing/testing, cache health checks.
Security: Self-hostable for air-gapped deployments, no default telemetry, OAuth 2.0 support.
LiteLLM has been decoupled from DSPy 3.2 to support flexible model serving infrastructure independent of specific framework implementations. This separation enables the library to function as a standalone model abstraction layer while remaining compatible with various orchestration systems and deployment patterns. The decoupling allows applications built with DSPy, LangChain, and other frameworks to leverage LiteLLM's provider abstraction without importing unnecessary framework components. This modularity supports microservice architectures where different components manage model interactions independently.
Teams can implement A/B testing across models, gradually migrate between providers, and maintain fallback mechanisms if primary providers experience availability issues. LiteLLM standardizes interactions, reducing complexity in managing vendor lock-in and supporting flexible, multi-model inference pipelines.
Recent releases demonstrate continuous improvement:3)