====== Langfuse ======
**Langfuse** is an open-source LLM observability platform that provides tracing, evaluation, prompt management, and cost tracking for production LLM applications.(([[https://github.com/langfuse/langfuse|Langfuse GitHub Repository]])) With over **24,000 GitHub stars** and MIT licensing, it has become the leading open-source alternative for monitoring and debugging AI applications in production.

| **Repository** | [[https://[[github|github]].com/langfuse/langfuse|github.com/langfuse/langfuse]] |
| **License** | MIT |
| **Language** | TypeScript, Python |
| **Stars** | 24K+ |
| **Category** | LLM Observability |

===== Key Features =====
  * **Application Tracing**, Captures full request lifecycle including LLM calls, retrieval, [[embeddings|embeddings]], tools, and API operations
  * **[[llm_as_judge|LLM-as-a-Judge]]**, Native support for automated evaluation scoring on traces and observations
  * **Prompt Management**, Versioned prompt storage with UI for management and playground testing
  * **Cost Tracking**, Automatic per-trace/span tracking of token usage and model costs
  * **50+ Integrations**, Native support for [[langchain|LangChain]], [[llamaindex|LlamaIndex]], [[openai|OpenAI]], and OpenTelemetry
  * **Self-Hostable**, Full self-hosting via Docker Compose or Kubernetes with no vendor lock-in(([[https://langfuse.com/docs|Langfuse Documentation]]))
  * **Zero-Latency Instrumentation**, Async background flushing ensures no added latency to applications

===== Architecture =====
Langfuse V4 (March 2026) employs an observations-first, immutable data model aligned with OpenTelemetry spans:(([[https://langfuse.com/blog/2026-03-10-simplify-langfuse-for-scale|Langfuse V4 Architecture Blog]]))

  * **PostgreSQL**, Handles transactional data: users, organizations, projects, API keys, prompts, datasets, evaluation settings
  * **ClickHouse**, Stores immutable tracing data: observations, scores, traces as correlation IDs
  * **Redis + BullMQ**, Manages event queues for async processing
  * **Ingestion Pipeline**, Native Python/JS SDKs, 50+ integrations, OpenTelemetry endpoints; asynchronous batching ensures zero added latency

The V4 architecture shifted to an observations-first model where traces are correlation IDs (like session_id) rather than top-level entities, with immutable spans ingested via OTel protocols.

<mermaid>
graph TB
    subgraph Apps["Instrumented Applications"]
        App1[Python App + SDK]
        App2[JS/TS App + SDK]
        App3[OpenTelemetry]
        App4[[[lite_llm|LiteLLM]] Gateway]
    end
    subgraph Ingestion["Ingestion Layer"]
        Queue[Redis + BullMQ]
        Batch[Micro-Batch Processor]
    end
    subgraph Storage["Storage Layer"]
        PG[(PostgreSQL - Transactional)]
        CH[(ClickHouse - Traces/Spans)]
    end
    subgraph Features["Feature Layer"]
        Trace[Trace Explorer]
        Eval[Evaluation Engine]
        Prompt[Prompt Manager]
        Cost[Cost Dashboard]
        Metrics[Metrics and Analytics]
    end
    subgraph UI["Web Dashboard"]
        Dashboard[Dashboard Views]
        Filters[Saved Filters]
        Graphs[Agent Graphs]
    end
    Apps --> Ingestion
    Queue --> Batch
    Batch --> Storage
    Storage --> Features
    Features --> UI
</mermaid>

===== Tracing Capabilities =====
Langfuse captures the full request lifecycle with rich detail:(([[https://langfuse.com/docs|Langfuse Documentation]]))

  * **LLM Operations**, Inputs, outputs, latency, token usage, model parameters
  * **Non-LLM Operations**, Retrieval steps, embedding generation, tool calls, API requests
  * **Session Tracking**, Multi-turn conversations with user identification
  * **Agent Graphs**, Visual representation of agent decision flows
  * **Environment Tagging**, Separate traces by development, staging, and production
  * **Custom Attributes**, Arbitrary metadata for filtering and analysis

===== Evaluation Features =====
Langfuse supports multiple evaluation approaches:

  * **[[llm_as_judge|LLM-as-a-Judge]]**, Automated scoring using LLMs to evaluate trace quality
  * **Dataset Experiments**, Run evaluations against curated datasets
  * **Score Storage**, All scores stored in ClickHouse alongside traces for analysis
  * **Custom Evaluators**, Define custom scoring functions for domain-specific quality metrics

===== Integrations =====
Langfuse provides native integrations with the major LLM frameworks:

  * **[[langchain|LangChain]] / [[langgraph|LangGraph]]**, Automatic tracing via callback handlers
  * **[[llamaindex|LlamaIndex]]**, Native callback integration
  * **[[openai|OpenAI]] SDK**, Direct capture of prompts, completions, and token usage
  * **OpenTelemetry**, Standard OTel protocol support (60% of cloud traffic)
  * **[[lite_llm|LiteLLM]]**, Gateway-level tracing for multi-provider setups

===== Code Example =====
<code python>
from langfuse import Langfuse
from langfuse.decorators import observe, langfuse_context
from [[openai|openai]] import [[openai|OpenAI]]

langfuse = Langfuse(
    public_key="pk-lf-...",
    secret_key="sk-lf-...",
    host="https://cloud.langfuse.com"  # or self-hosted URL
)
client = [[openai|OpenAI]]()

@observe()
def retrieve_context(query: str) -> str:
    """Retrieve relevant context for the query."""
    # Your retrieval logic here
    langfuse_context.update_current_observation(
        metadata={"retriever": "hybrid", "top_k": 5}
    )
    return "Retrieved context about the topic..."

@observe()
def generate_answer(query: str) -> str:
    """Full RAG pipeline with automatic tracing."""
    context = retrieve_context(query)

    response = client.chat.completions.create(
        model="gpt-4o",
        messages=[
            {"role": "system", "content": f"Context: {context}"},
            {"role": "user", "content": query}
        ]
    )

    # Score the trace
    langfuse_context.score_current_trace(
        name="relevance", value=0.9,
        comment="High relevance to query"
    )
    return response.choices[0].message.content

answer = generate_answer("How does RAG work?")
print(answer)
langfuse.flush()  # Ensure all events are sent
</code>

===== See Also =====
  * [[langsmith|LangSmith]]
  * [[arize_phoenix|Arize Phoenix]]
  * [[langchain|LangChain]]
  * [[deepeval|DeepEval]]
  * [[promptfoo|Promptfoo]]

===== References =====