====== Langfuse ====== **Langfuse** is an open-source LLM observability platform that provides tracing, evaluation, prompt management, and cost tracking for production LLM applications. With over **24,000 GitHub stars** and MIT licensing, it has become the leading open-source alternative for monitoring and debugging AI applications in production. | **Repository** | [[https://github.com/langfuse/langfuse|github.com/langfuse/langfuse]] | | **License** | MIT | | **Language** | TypeScript, Python | | **Stars** | 24K+ | | **Category** | LLM Observability | ===== Key Features ===== * **Application Tracing** -- Captures full request lifecycle including LLM calls, retrieval, embeddings, tools, and API operations * **LLM-as-a-Judge** -- Native support for automated evaluation scoring on traces and observations * **Prompt Management** -- Versioned prompt storage with UI for management and playground testing * **Cost Tracking** -- Automatic per-trace/span tracking of token usage and model costs * **50+ Integrations** -- Native support for LangChain, LlamaIndex, OpenAI, and OpenTelemetry * **Self-Hostable** -- Full self-hosting via Docker Compose or Kubernetes with no vendor lock-in * **Zero-Latency Instrumentation** -- Async background flushing ensures no added latency to applications ===== Architecture ===== Langfuse V4 (March 2026) employs an observations-first, immutable data model aligned with OpenTelemetry spans: * **PostgreSQL** -- Handles transactional data: users, organizations, projects, API keys, prompts, datasets, evaluation settings * **ClickHouse** -- Stores immutable tracing data: observations, scores, traces as correlation IDs * **Redis + BullMQ** -- Manages event queues for async processing * **Ingestion Pipeline** -- Native Python/JS SDKs, 50+ integrations, OpenTelemetry endpoints; asynchronous batching ensures zero added latency The V4 architecture shifted to an observations-first model where traces are correlation IDs (like session_id) rather than top-level entities, with immutable spans ingested via OTel protocols. graph TB subgraph Apps["Instrumented Applications"] App1[Python App + SDK] App2[JS/TS App + SDK] App3[OpenTelemetry] App4[LiteLLM Gateway] end subgraph Ingestion["Ingestion Layer"] Queue[Redis + BullMQ] Batch[Micro-Batch Processor] end subgraph Storage["Storage Layer"] PG[(PostgreSQL - Transactional)] CH[(ClickHouse - Traces/Spans)] end subgraph Features["Feature Layer"] Trace[Trace Explorer] Eval[Evaluation Engine] Prompt[Prompt Manager] Cost[Cost Dashboard] Metrics[Metrics and Analytics] end subgraph UI["Web Dashboard"] Dashboard[Dashboard Views] Filters[Saved Filters] Graphs[Agent Graphs] end Apps --> Ingestion Queue --> Batch Batch --> Storage Storage --> Features Features --> UI ===== Tracing Capabilities ===== Langfuse captures the full request lifecycle with rich detail: * **LLM Operations** -- Inputs, outputs, latency, token usage, model parameters * **Non-LLM Operations** -- Retrieval steps, embedding generation, tool calls, API requests * **Session Tracking** -- Multi-turn conversations with user identification * **Agent Graphs** -- Visual representation of agent decision flows * **Environment Tagging** -- Separate traces by development, staging, and production * **Custom Attributes** -- Arbitrary metadata for filtering and analysis ===== Evaluation Features ===== Langfuse supports multiple evaluation approaches: * **LLM-as-a-Judge** -- Automated scoring using LLMs to evaluate trace quality * **Dataset Experiments** -- Run evaluations against curated datasets * **Score Storage** -- All scores stored in ClickHouse alongside traces for analysis * **Custom Evaluators** -- Define custom scoring functions for domain-specific quality metrics ===== Integrations ===== Langfuse provides native integrations with the major LLM frameworks: * **LangChain / LangGraph** -- Automatic tracing via callback handlers * **LlamaIndex** -- Native callback integration * **OpenAI SDK** -- Direct capture of prompts, completions, and token usage * **OpenTelemetry** -- Standard OTel protocol support (60% of cloud traffic) * **LiteLLM** -- Gateway-level tracing for multi-provider setups ===== Code Example ===== from langfuse import Langfuse from langfuse.decorators import observe, langfuse_context from openai import OpenAI langfuse = Langfuse( public_key="pk-lf-...", secret_key="sk-lf-...", host="https://cloud.langfuse.com" # or self-hosted URL ) client = OpenAI() @observe() def retrieve_context(query: str) -> str: """Retrieve relevant context for the query.""" # Your retrieval logic here langfuse_context.update_current_observation( metadata={"retriever": "hybrid", "top_k": 5} ) return "Retrieved context about the topic..." @observe() def generate_answer(query: str) -> str: """Full RAG pipeline with automatic tracing.""" context = retrieve_context(query) response = client.chat.completions.create( model="gpt-4o", messages=[ {"role": "system", "content": f"Context: {context}"}, {"role": "user", "content": query} ] ) # Score the trace langfuse_context.score_current_trace( name="relevance", value=0.9, comment="High relevance to query" ) return response.choices[0].message.content answer = generate_answer("How does RAG work?") print(answer) langfuse.flush() # Ensure all events are sent ===== References ===== * [[https://github.com/langfuse/langfuse|Langfuse GitHub Repository]] * [[https://langfuse.com|Langfuse Official Website]] * [[https://langfuse.com/docs|Langfuse Documentation]] * [[https://langfuse.com/blog/2026-03-10-simplify-langfuse-for-scale|Langfuse V4 Architecture Blog]] ===== See Also ===== * [[dify|Dify]] -- Agentic workflow platform * [[opencode|OpenCode]] -- AI coding agent * [[mem0|Mem0]] -- Memory layer for AI agents * [[mcp_servers|MCP Servers]] -- Model Context Protocol implementations