Arize Phoenix

Arize Phoenix is an open-source AI observability platform for tracing, evaluating, and troubleshooting LLM applications. With over 9,000 stars on GitHub, it provides end-to-end visibility into AI system behavior using OpenTelemetry-based instrumentation — capturing traces of LLM flows across frameworks like LangChain, LlamaIndex, Haystack, DSPy, and providers like OpenAI and Bedrock.¹⁾

Phoenix combines tracing, evaluation, and dataset management in one tool, purpose-built for LLM-specific issues like prompt drift, hallucinations, tool flakiness, and cost analysis. It runs locally in Jupyter notebooks, self-hosted, or in the cloud — with zero vendor lock-in thanks to OpenTelemetry.²⁾

How It Works

Phoenix instruments your LLM application using OpenTelemetry (OTEL) and the OpenInference specification for AI-specific telemetry. Every LLM call, retrieval step, tool invocation, and reasoning chain is captured as a span within a trace. These traces are visualized in a web UI that shows the full execution path, latencies, token counts, and costs.

The platform then enables LLM-powered evaluations — running benchmarks for response quality, retrieval relevance, faithfulness, and toxicity against your traced data. Combined with versioned datasets and a prompt playground, Phoenix creates a complete experimentation loop.

Key Features

OpenTelemetry tracing — Vendor-agnostic, portable traces across any LLM stack
Auto-instrumentation — One-line setup for LangChain, LlamaIndex, OpenAI, Bedrock
LLM evaluations — Quality, relevance, toxicity benchmarks with human annotations
Prompt Playground — Side-by-side prompt testing and Span Replay for debugging
Embedding clustering — Group similar inputs/responses to isolate issues
Dataset versioning — Track changes across experiments and fine-tuning
Flexible deployment — Jupyter notebooks, self-hosted, or cloud.³⁾

Installation and Usage

# Install Phoenix
# pip install arize-phoenix openinference-instrumentation-openai
 
import phoenix as px
from openinference.instrumentation.openai import OpenAIInstrumentor
from opentelemetry import trace as trace_api
from opentelemetry.sdk.trace import TracerProvider
from opentelemetry.sdk.trace.export import SimpleSpanProcessor
 
# Launch Phoenix
session = px.launch_app()
 
# Set up OpenTelemetry tracing
tracer_provider = TracerProvider()
tracer_provider.add_span_processor(
    SimpleSpanProcessor(px.otel.SimpleSpanExporter())
)
trace_api.set_tracer_provider(tracer_provider)
 
# Auto-instrument OpenAI
OpenAIInstrumentor().instrument()
 
# Now all OpenAI calls are automatically traced
from openai import OpenAI
client = OpenAI()
 
response = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "Explain quantum computing"}]
)
# Trace is automatically captured and visible in Phoenix UI
 
# Run evaluations on traced data
from phoenix.evals import llm_classify
 
eval_results = llm_classify(
    dataframe=px.Client().get_spans_dataframe(),
    model=px.evals.OpenAIModel(model="gpt-4o"),
    template="Is this response relevant to the query? {input} -> {output}",
    rails=["relevant", "irrelevant"]
)

Architecture

%%{init: {'theme': 'dark'}}%%
graph TB
    App([Your LLM App]) -->|Auto-Instrumented| OTEL[OpenTelemetry SDK]
    OTEL -->|Spans + Traces| Collector[Phoenix Collector]
    Collector -->|Store| DB[Trace Database]
    DB -->|Query| UI[Phoenix Web UI]
    UI -->|Trace View| Traces[Trace Explorer]
    UI -->|Metrics| Dashboard[Latency / Cost / Tokens]
    UI -->|Clusters| Embed[Embedding Clusters]
    UI -->|Testing| Playground[Prompt Playground]
    DB -->|Evaluation| Evals[LLM Evaluators]
    Evals -->|Relevance / Faithfulness| Scores[Quality Scores]
    Evals -->|Toxicity / Hallucination| Safety[Safety Checks]
    DB -->|Export| Datasets[Versioned Datasets]
    Datasets -->|Fine-tuning| Training[Model Training]
    subgraph Instrumented Frameworks
        LC[LangChain]
        LI[LlamaIndex]
        OAI[OpenAI]
        Bed[Bedrock]
        Hay[Haystack]
    end
    App --- Instrumented Frameworks

Deployment Options

Mode	Description	Best For
Notebook	'px.launch_app()' in Jupyter	Rapid experimentation
Self-hosted	Docker container with persistent storage	Team collaboration
Cloud	Arize cloud platform	Production monitoring

References

¹⁾

Arize Phoenix GitHub Repository. github.com/Arize-ai/phoenix

²⁾

Official Documentation. phoenix.arize.com

³⁾

Statsig. “Arize Phoenix AI Observability.” statsig.com

AI Agent Knowledge Base

Sidebar

Table of Contents

Arize Phoenix

How It Works

Key Features

Installation and Usage

Architecture

Deployment Options

See Also

References

AI Agent Knowledge Base

User Tools

Site Tools

Sidebar

Table of Contents

Arize Phoenix

How It Works

Key Features

Installation and Usage

Architecture

Deployment Options

See Also

References

Page Tools