====== Langfuse ======
**Langfuse** is an open-source LLM observability platform that provides tracing, evaluation, prompt management, and cost tracking for production LLM applications. With over **24,000 GitHub stars** and MIT licensing, it has become the leading open-source alternative for monitoring and debugging AI applications in production.
| **Repository** | [[https://github.com/langfuse/langfuse|github.com/langfuse/langfuse]] |
| **License** | MIT |
| **Language** | TypeScript, Python |
| **Stars** | 24K+ |
| **Category** | LLM Observability |
===== Key Features =====
* **Application Tracing** -- Captures full request lifecycle including LLM calls, retrieval, embeddings, tools, and API operations
* **LLM-as-a-Judge** -- Native support for automated evaluation scoring on traces and observations
* **Prompt Management** -- Versioned prompt storage with UI for management and playground testing
* **Cost Tracking** -- Automatic per-trace/span tracking of token usage and model costs
* **50+ Integrations** -- Native support for LangChain, LlamaIndex, OpenAI, and OpenTelemetry
* **Self-Hostable** -- Full self-hosting via Docker Compose or Kubernetes with no vendor lock-in
* **Zero-Latency Instrumentation** -- Async background flushing ensures no added latency to applications
===== Architecture =====
Langfuse V4 (March 2026) employs an observations-first, immutable data model aligned with OpenTelemetry spans:
* **PostgreSQL** -- Handles transactional data: users, organizations, projects, API keys, prompts, datasets, evaluation settings
* **ClickHouse** -- Stores immutable tracing data: observations, scores, traces as correlation IDs
* **Redis + BullMQ** -- Manages event queues for async processing
* **Ingestion Pipeline** -- Native Python/JS SDKs, 50+ integrations, OpenTelemetry endpoints; asynchronous batching ensures zero added latency
The V4 architecture shifted to an observations-first model where traces are correlation IDs (like session_id) rather than top-level entities, with immutable spans ingested via OTel protocols.
graph TB
subgraph Apps["Instrumented Applications"]
App1[Python App + SDK]
App2[JS/TS App + SDK]
App3[OpenTelemetry]
App4[LiteLLM Gateway]
end
subgraph Ingestion["Ingestion Layer"]
Queue[Redis + BullMQ]
Batch[Micro-Batch Processor]
end
subgraph Storage["Storage Layer"]
PG[(PostgreSQL - Transactional)]
CH[(ClickHouse - Traces/Spans)]
end
subgraph Features["Feature Layer"]
Trace[Trace Explorer]
Eval[Evaluation Engine]
Prompt[Prompt Manager]
Cost[Cost Dashboard]
Metrics[Metrics and Analytics]
end
subgraph UI["Web Dashboard"]
Dashboard[Dashboard Views]
Filters[Saved Filters]
Graphs[Agent Graphs]
end
Apps --> Ingestion
Queue --> Batch
Batch --> Storage
Storage --> Features
Features --> UI
===== Tracing Capabilities =====
Langfuse captures the full request lifecycle with rich detail:
* **LLM Operations** -- Inputs, outputs, latency, token usage, model parameters
* **Non-LLM Operations** -- Retrieval steps, embedding generation, tool calls, API requests
* **Session Tracking** -- Multi-turn conversations with user identification
* **Agent Graphs** -- Visual representation of agent decision flows
* **Environment Tagging** -- Separate traces by development, staging, and production
* **Custom Attributes** -- Arbitrary metadata for filtering and analysis
===== Evaluation Features =====
Langfuse supports multiple evaluation approaches:
* **LLM-as-a-Judge** -- Automated scoring using LLMs to evaluate trace quality
* **Dataset Experiments** -- Run evaluations against curated datasets
* **Score Storage** -- All scores stored in ClickHouse alongside traces for analysis
* **Custom Evaluators** -- Define custom scoring functions for domain-specific quality metrics
===== Integrations =====
Langfuse provides native integrations with the major LLM frameworks:
* **LangChain / LangGraph** -- Automatic tracing via callback handlers
* **LlamaIndex** -- Native callback integration
* **OpenAI SDK** -- Direct capture of prompts, completions, and token usage
* **OpenTelemetry** -- Standard OTel protocol support (60% of cloud traffic)
* **LiteLLM** -- Gateway-level tracing for multi-provider setups
===== Code Example =====
from langfuse import Langfuse
from langfuse.decorators import observe, langfuse_context
from openai import OpenAI
langfuse = Langfuse(
public_key="pk-lf-...",
secret_key="sk-lf-...",
host="https://cloud.langfuse.com" # or self-hosted URL
)
client = OpenAI()
@observe()
def retrieve_context(query: str) -> str:
"""Retrieve relevant context for the query."""
# Your retrieval logic here
langfuse_context.update_current_observation(
metadata={"retriever": "hybrid", "top_k": 5}
)
return "Retrieved context about the topic..."
@observe()
def generate_answer(query: str) -> str:
"""Full RAG pipeline with automatic tracing."""
context = retrieve_context(query)
response = client.chat.completions.create(
model="gpt-4o",
messages=[
{"role": "system", "content": f"Context: {context}"},
{"role": "user", "content": query}
]
)
# Score the trace
langfuse_context.score_current_trace(
name="relevance", value=0.9,
comment="High relevance to query"
)
return response.choices[0].message.content
answer = generate_answer("How does RAG work?")
print(answer)
langfuse.flush() # Ensure all events are sent
===== References =====
* [[https://github.com/langfuse/langfuse|Langfuse GitHub Repository]]
* [[https://langfuse.com|Langfuse Official Website]]
* [[https://langfuse.com/docs|Langfuse Documentation]]
* [[https://langfuse.com/blog/2026-03-10-simplify-langfuse-for-scale|Langfuse V4 Architecture Blog]]
===== See Also =====
* [[dify|Dify]] -- Agentic workflow platform
* [[opencode|OpenCode]] -- AI coding agent
* [[mem0|Mem0]] -- Memory layer for AI agents
* [[mcp_servers|MCP Servers]] -- Model Context Protocol implementations