Langfuse
Langfuse is an open-source LLM observability platform that provides tracing, evaluation, prompt management, and cost tracking for production LLM applications.1) With over 24,000 GitHub stars and MIT licensing, it has become the leading open-source alternative for monitoring and debugging AI applications in production.
| Repository | github.com/langfuse/langfuse | github.com/langfuse/langfuse]] |
| License | MIT | |
| Language | TypeScript, Python | |
| Stars | 24K+ | |
| Category | LLM Observability | |
Key Features
Application Tracing, Captures full request lifecycle including LLM calls, retrieval,
embeddings, tools, and
API operations
LLM-as-a-Judge, Native support for automated evaluation scoring on traces and observations
Prompt Management, Versioned prompt storage with UI for management and playground testing
Cost Tracking, Automatic per-trace/span tracking of token usage and model costs
-
Self-Hostable, Full self-hosting via Docker Compose or Kubernetes with no vendor lock-in
2)
Zero-Latency Instrumentation, Async background flushing ensures no added latency to applications
Architecture
Langfuse V4 (March 2026) employs an observations-first, immutable data model aligned with OpenTelemetry spans:3)
PostgreSQL, Handles transactional data: users, organizations, projects,
API keys, prompts, datasets, evaluation settings
ClickHouse, Stores immutable tracing data: observations, scores, traces as correlation IDs
Redis + BullMQ, Manages event queues for async processing
Ingestion Pipeline, Native Python/JS SDKs, 50+ integrations, OpenTelemetry endpoints; asynchronous batching ensures zero added latency
The V4 architecture shifted to an observations-first model where traces are correlation IDs (like session_id) rather than top-level entities, with immutable spans ingested via OTel protocols.
graph TB
subgraph Apps["Instrumented Applications"]
App1[Python App + SDK]
App2[JS/TS App + SDK]
App3[OpenTelemetry]
App4[[[lite_llm|LiteLLM]] Gateway]
end
subgraph Ingestion["Ingestion Layer"]
Queue[Redis + BullMQ]
Batch[Micro-Batch Processor]
end
subgraph Storage["Storage Layer"]
PG[(PostgreSQL - Transactional)]
CH[(ClickHouse - Traces/Spans)]
end
subgraph Features["Feature Layer"]
Trace[Trace Explorer]
Eval[Evaluation Engine]
Prompt[Prompt Manager]
Cost[Cost Dashboard]
Metrics[Metrics and Analytics]
end
subgraph UI["Web Dashboard"]
Dashboard[Dashboard Views]
Filters[Saved Filters]
Graphs[Agent Graphs]
end
Apps --> Ingestion
Queue --> Batch
Batch --> Storage
Storage --> Features
Features --> UI
Tracing Capabilities
Langfuse captures the full request lifecycle with rich detail:4)
LLM Operations, Inputs, outputs, latency, token usage, model parameters
Non-LLM Operations, Retrieval steps, embedding generation, tool calls,
API requests
Session Tracking, Multi-turn conversations with user identification
Agent Graphs, Visual representation of agent decision flows
Environment Tagging, Separate traces by development, staging, and production
Custom Attributes, Arbitrary metadata for filtering and analysis
Evaluation Features
Langfuse supports multiple evaluation approaches:
LLM-as-a-Judge, Automated scoring using LLMs to evaluate trace quality
Dataset Experiments, Run evaluations against curated datasets
Score Storage, All scores stored in ClickHouse alongside traces for analysis
Custom Evaluators, Define custom scoring functions for domain-specific quality metrics
Integrations
Langfuse provides native integrations with the major LLM frameworks:
Code Example
from langfuse import Langfuse
from langfuse.decorators import observe, langfuse_context
from [[openai|openai]] import [[openai|OpenAI]]
langfuse = Langfuse(
public_key="pk-lf-...",
secret_key="sk-lf-...",
host="https://cloud.langfuse.com" # or self-hosted URL
)
client = [[openai|OpenAI]]()
@observe()
def retrieve_context(query: str) -> str:
"""Retrieve relevant context for the query."""
# Your retrieval logic here
langfuse_context.update_current_observation(
metadata={"retriever": "hybrid", "top_k": 5}
)
return "Retrieved context about the topic..."
@observe()
def generate_answer(query: str) -> str:
"""Full RAG pipeline with automatic tracing."""
context = retrieve_context(query)
response = client.chat.completions.create(
model="gpt-4o",
messages=[
{"role": "system", "content": f"Context: {context}"},
{"role": "user", "content": query}
]
)
# Score the trace
langfuse_context.score_current_trace(
name="relevance", value=0.9,
comment="High relevance to query"
)
return response.choices[0].message.content
answer = generate_answer("How does RAG work?")
print(answer)
langfuse.flush() # Ensure all events are sent
See Also
References