Table of Contents

LangSmith

LangSmith is a framework-agnostic observability, evaluation, and deployment platform by LangChain for developing, debugging, and deploying AI agents and LLM applications. It provides end-to-end tracing, testing, prompt management, and production monitoring — whether you use LangChain, LlamaIndex, or a custom stack.

Overview

LangSmith addresses the core challenge of LLM application development: non-deterministic outputs are hard to debug and optimize. It captures detailed execution traces of every LLM call, chain, agent step, and tool invocation, giving developers full visibility into what their applications are actually doing in production.

The platform is HIPAA, SOC 2 Type 2, and GDPR compliant, making it suitable for regulated enterprise environments.

Key capabilities:

Architecture

graph TD A[Your Application: LangChain / LangGraph / LlamaIndex / Custom] -->|traces async| B[LangSmith Platform] subgraph B[LangSmith Platform] C[Tracing Engine] D[Evaluation Engine] E[Deployment: Agents] F[Datasets] G[Playground] H[Dashboards] end

Getting Started

LangSmith requires zero code changes for LangChain/LangGraph apps — just set environment variables:

import os
 
# Enable tracing (works with LangChain/LangGraph automatically)
os.environ["LANGSMITH_TRACING"] = "true"
os.environ["LANGCHAIN_API_KEY"] = "ls-your-api-key"
os.environ["LANGCHAIN_PROJECT"] = "my-project"
 
# For programmatic access to runs, datasets, and metrics
from langsmith import Client
 
client = Client()
 
# List successful production runs with token/cost data
runs = client.list_runs(
    project_name="production-agents",
    execution_order=1,
    error=False
)
for run in runs:
    print(f"Run: {run.name}, Tokens: {run.total_tokens}")
    print(f"Latency: {run.latency}s, Cost: ${run.total_cost}")
 
# Create an evaluation dataset from production traces
dataset = client.create_dataset("eval-golden-set")
for run in client.list_runs(project_name="production-agents", limit=50):
    client.create_example(
        inputs=run.inputs,
        outputs=run.outputs,
        dataset_id=dataset.id,
    )

For non-LangChain frameworks, use the SDK's @traceable decorator or manual span creation for full instrumentation.

Framework-Agnostic Integration

While LangSmith integrates automatically with the LangChain ecosystem, it supports any LLM application:

Evaluation Workflow

LangSmith's evaluation system supports systematic quality measurement:

  1. Create datasets from production traces or manual examples
  2. Define scorers (LLM-as-judge, heuristic, or human)
  3. Run evaluations across model/prompt variants
  4. Compare results in experiment views with aggregated metrics
  5. Set up annotation queues for human review

Key Strengths

References

See Also