Core Concepts
Reasoning
Memory & Retrieval
Agent Types
Design Patterns
Training & Alignment
Frameworks
Tools
Safety & Security
Evaluation
Meta
Core Concepts
Reasoning
Memory & Retrieval
Agent Types
Design Patterns
Training & Alignment
Frameworks
Tools
Safety & Security
Evaluation
Meta
Arize Phoenix is an open-source AI observability platform for tracing, evaluating, and troubleshooting LLM applications. With over 9,000 stars on GitHub, it provides end-to-end visibility into AI system behavior using OpenTelemetry-based instrumentation — capturing traces of LLM flows across frameworks like LangChain, LlamaIndex, Haystack, DSPy, and providers like OpenAI and Bedrock.1)
Phoenix combines tracing, evaluation, and dataset management in one tool, purpose-built for LLM-specific issues like prompt drift, hallucinations, tool flakiness, and cost analysis. It runs locally in Jupyter notebooks, self-hosted, or in the cloud — with zero vendor lock-in thanks to OpenTelemetry.2)
Phoenix instruments your LLM application using OpenTelemetry (OTEL) and the OpenInference specification for AI-specific telemetry. Every LLM call, retrieval step, tool invocation, and reasoning chain is captured as a span within a trace. These traces are visualized in a web UI that shows the full execution path, latencies, token counts, and costs.
The platform then enables LLM-powered evaluations — running benchmarks for response quality, retrieval relevance, faithfulness, and toxicity against your traced data. Combined with versioned datasets and a prompt playground, Phoenix creates a complete experimentation loop.
# Install Phoenix # pip install arize-phoenix openinference-instrumentation-openai import phoenix as px from openinference.instrumentation.openai import OpenAIInstrumentor from opentelemetry import trace as trace_api from opentelemetry.sdk.trace import TracerProvider from opentelemetry.sdk.trace.export import SimpleSpanProcessor # Launch Phoenix session = px.launch_app() # Set up OpenTelemetry tracing tracer_provider = TracerProvider() tracer_provider.add_span_processor( SimpleSpanProcessor(px.otel.SimpleSpanExporter()) ) trace_api.set_tracer_provider(tracer_provider) # Auto-instrument OpenAI OpenAIInstrumentor().instrument() # Now all OpenAI calls are automatically traced from openai import OpenAI client = OpenAI() response = client.chat.completions.create( model="gpt-4o", messages=[{"role": "user", "content": "Explain quantum computing"}] ) # Trace is automatically captured and visible in Phoenix UI # Run evaluations on traced data from phoenix.evals import llm_classify eval_results = llm_classify( dataframe=px.Client().get_spans_dataframe(), model=px.evals.OpenAIModel(model="gpt-4o"), template="Is this response relevant to the query? {input} -> {output}", rails=["relevant", "irrelevant"] )
%%{init: {'theme': 'dark'}}%%
graph TB
App([Your LLM App]) -->|Auto-Instrumented| OTEL[OpenTelemetry SDK]
OTEL -->|Spans + Traces| Collector[Phoenix Collector]
Collector -->|Store| DB[Trace Database]
DB -->|Query| UI[Phoenix Web UI]
UI -->|Trace View| Traces[Trace Explorer]
UI -->|Metrics| Dashboard[Latency / Cost / Tokens]
UI -->|Clusters| Embed[Embedding Clusters]
UI -->|Testing| Playground[Prompt Playground]
DB -->|Evaluation| Evals[LLM Evaluators]
Evals -->|Relevance / Faithfulness| Scores[Quality Scores]
Evals -->|Toxicity / Hallucination| Safety[Safety Checks]
DB -->|Export| Datasets[Versioned Datasets]
Datasets -->|Fine-tuning| Training[Model Training]
subgraph Instrumented Frameworks
LC[LangChain]
LI[LlamaIndex]
OAI[OpenAI]
Bed[Bedrock]
Hay[Haystack]
end
App --- Instrumented Frameworks
| Mode | Description | Best For |
|---|---|---|
| Notebook | 'px.launch_app()' in Jupyter | Rapid experimentation |
| Self-hosted | Docker container with persistent storage | Team collaboration |
| Cloud | Arize cloud platform | Production monitoring |