Langfuse

Langfuse is an open-source LLM observability platform that provides tracing, evaluation, prompt management, and cost tracking for production LLM applications. With over 24,000 GitHub stars and MIT licensing, it has become the leading open-source alternative for monitoring and debugging AI applications in production.

Repository	github.com/langfuse/langfuse
License	MIT
Language	TypeScript, Python
Stars	24K+
Category	LLM Observability

Key Features

Application Tracing – Captures full request lifecycle including LLM calls, retrieval, embeddings, tools, and API operations
LLM-as-a-Judge – Native support for automated evaluation scoring on traces and observations
Prompt Management – Versioned prompt storage with UI for management and playground testing
Cost Tracking – Automatic per-trace/span tracking of token usage and model costs
50+ Integrations – Native support for LangChain, LlamaIndex, OpenAI, and OpenTelemetry
Self-Hostable – Full self-hosting via Docker Compose or Kubernetes with no vendor lock-in
Zero-Latency Instrumentation – Async background flushing ensures no added latency to applications

Architecture

Langfuse V4 (March 2026) employs an observations-first, immutable data model aligned with OpenTelemetry spans:

PostgreSQL – Handles transactional data: users, organizations, projects, API keys, prompts, datasets, evaluation settings
ClickHouse – Stores immutable tracing data: observations, scores, traces as correlation IDs
Redis + BullMQ – Manages event queues for async processing
Ingestion Pipeline – Native Python/JS SDKs, 50+ integrations, OpenTelemetry endpoints; asynchronous batching ensures zero added latency

The V4 architecture shifted to an observations-first model where traces are correlation IDs (like session_id) rather than top-level entities, with immutable spans ingested via OTel protocols.

graph TB subgraph Apps["Instrumented Applications"] App1[Python App + SDK] App2[JS/TS App + SDK] App3[OpenTelemetry] App4[LiteLLM Gateway] end subgraph Ingestion["Ingestion Layer"] Queue[Redis + BullMQ] Batch[Micro-Batch Processor] end subgraph Storage["Storage Layer"] PG[(PostgreSQL - Transactional)] CH[(ClickHouse - Traces/Spans)] end subgraph Features["Feature Layer"] Trace[Trace Explorer] Eval[Evaluation Engine] Prompt[Prompt Manager] Cost[Cost Dashboard] Metrics[Metrics and Analytics] end subgraph UI["Web Dashboard"] Dashboard[Dashboard Views] Filters[Saved Filters] Graphs[Agent Graphs] end Apps --> Ingestion Queue --> Batch Batch --> Storage Storage --> Features Features --> UI

Tracing Capabilities

Langfuse captures the full request lifecycle with rich detail:

LLM Operations – Inputs, outputs, latency, token usage, model parameters
Non-LLM Operations – Retrieval steps, embedding generation, tool calls, API requests
Session Tracking – Multi-turn conversations with user identification
Agent Graphs – Visual representation of agent decision flows
Environment Tagging – Separate traces by development, staging, and production
Custom Attributes – Arbitrary metadata for filtering and analysis

Evaluation Features

Langfuse supports multiple evaluation approaches:

LLM-as-a-Judge – Automated scoring using LLMs to evaluate trace quality
Dataset Experiments – Run evaluations against curated datasets
Score Storage – All scores stored in ClickHouse alongside traces for analysis
Custom Evaluators – Define custom scoring functions for domain-specific quality metrics

Integrations

Langfuse provides native integrations with the major LLM frameworks:

LangChain / LangGraph – Automatic tracing via callback handlers
LlamaIndex – Native callback integration
OpenAI SDK – Direct capture of prompts, completions, and token usage
OpenTelemetry – Standard OTel protocol support (60% of cloud traffic)
LiteLLM – Gateway-level tracing for multi-provider setups

Code Example

from langfuse import Langfuse
from langfuse.decorators import observe, langfuse_context
from openai import OpenAI
 
langfuse = Langfuse(
    public_key="pk-lf-...",
    secret_key="sk-lf-...",
    host="https://cloud.langfuse.com"  # or self-hosted URL
)
client = OpenAI()
 
@observe()
def retrieve_context(query: str) -> str:
    """Retrieve relevant context for the query."""
    # Your retrieval logic here
    langfuse_context.update_current_observation(
        metadata={"retriever": "hybrid", "top_k": 5}
    )
    return "Retrieved context about the topic..."
 
@observe()
def generate_answer(query: str) -> str:
    """Full RAG pipeline with automatic tracing."""
    context = retrieve_context(query)
 
    response = client.chat.completions.create(
        model="gpt-4o",
        messages=[
            {"role": "system", "content": f"Context: {context}"},
            {"role": "user", "content": query}
        ]
    )
 
    # Score the trace
    langfuse_context.score_current_trace(
        name="relevance", value=0.9,
        comment="High relevance to query"
    )
    return response.choices[0].message.content
 
answer = generate_answer("How does RAG work?")
print(answer)
langfuse.flush()  # Ensure all events are sent

AI Agent Knowledge Base

Sidebar

Table of Contents

Langfuse

Key Features

Architecture

Tracing Capabilities

Evaluation Features

Integrations

Code Example

References

See Also

AI Agent Knowledge Base

User Tools

Site Tools

Sidebar

Table of Contents

Langfuse

Key Features

Architecture

Tracing Capabilities

Evaluation Features

Integrations

Code Example

References

See Also

Page Tools