====== LangSmith ====== **LangSmith** is a framework-agnostic observability, evaluation, and deployment platform by [[langchain|LangChain]] for developing, debugging, and deploying AI agents and LLM applications. It provides end-to-end tracing, testing, prompt management, and production monitoring — whether you use LangChain, LlamaIndex, or a custom stack. ===== Overview ===== LangSmith addresses the core challenge of LLM application development: non-deterministic outputs are hard to debug and optimize. It captures detailed execution traces of every LLM call, chain, agent step, and tool invocation, giving developers full visibility into what their applications are actually doing in production. The platform is **HIPAA**, **SOC 2 Type 2**, and **GDPR** compliant, making it suitable for regulated enterprise environments. Key capabilities: * **Tracing** — Records full execution graphs including inputs, outputs, latency, token usage, costs, and nested tool/retriever calls with minimal runtime overhead via async transmission * **Evaluation** — Dataset management, LLM-as-judge scoring, A/B testing across models and prompts, and annotation queues for human feedback * **Testing** — Organize traces into test cases, run regression tests, and compare experiment results with dedicated experiment views * **Deployment** — Scalable servers for stateful, long-running agents with streaming, error recovery, and load balancing * **Fleet Visual Builder** — Visual interface (via LangGraph Studio integration) for designing, testing, and refining agent workflows before coding ===== Architecture ===== graph TD A[Your Application: LangChain / LangGraph / LlamaIndex / Custom] -->|traces async| B[LangSmith Platform] subgraph B[LangSmith Platform] C[Tracing Engine] D[Evaluation Engine] E[Deployment: Agents] F[Datasets] G[Playground] H[Dashboards] end ===== Getting Started ===== LangSmith requires zero code changes for LangChain/LangGraph apps — just set environment variables: import os # Enable tracing (works with LangChain/LangGraph automatically) os.environ["LANGSMITH_TRACING"] = "true" os.environ["LANGCHAIN_API_KEY"] = "ls-your-api-key" os.environ["LANGCHAIN_PROJECT"] = "my-project" # For programmatic access to runs, datasets, and metrics from langsmith import Client client = Client() # List successful production runs with token/cost data runs = client.list_runs( project_name="production-agents", execution_order=1, error=False ) for run in runs: print(f"Run: {run.name}, Tokens: {run.total_tokens}") print(f"Latency: {run.latency}s, Cost: ${run.total_cost}") # Create an evaluation dataset from production traces dataset = client.create_dataset("eval-golden-set") for run in client.list_runs(project_name="production-agents", limit=50): client.create_example( inputs=run.inputs, outputs=run.outputs, dataset_id=dataset.id, ) For non-LangChain frameworks, use the SDK's ''@traceable'' decorator or manual span creation for full instrumentation. ===== Framework-Agnostic Integration ===== While LangSmith integrates automatically with the LangChain ecosystem, it supports any LLM application: * **LangChain/LangGraph** — Zero-config via environment variables * **LlamaIndex** — SDK integration with callback handlers * **Custom code** — ''@traceable'' decorator or ''RunTree'' API for manual instrumentation * **REST API** — Direct HTTP calls for any language or framework ===== Evaluation Workflow ===== LangSmith's evaluation system supports systematic quality measurement: - Create datasets from production traces or manual examples - Define scorers (LLM-as-judge, heuristic, or human) - Run evaluations across model/prompt variants - Compare results in experiment views with aggregated metrics - Set up annotation queues for human review ===== Key Strengths ===== * Negligible runtime overhead via async trace transmission * Deep integration with LangGraph for stateful agent debugging * Time-travel debugging for reproducing agent failures * Production dashboards tracking latency, error rates, costs, and quality * Enterprise-grade compliance (HIPAA, SOC 2, GDPR) ===== References ===== * [[https://docs.smith.langchain.com/|LangSmith Documentation]] * [[https://smith.langchain.com/|LangSmith Platform]] * [[https://blog.langchain.dev/|LangChain Blog]] * [[https://github.com/langchain-ai/langsmith-sdk|LangSmith Python SDK on GitHub]] ===== See Also ===== * [[langchain|LangChain]] — LLM application framework * [[langgraph|LangGraph]] — Stateful agent orchestration * [[wandb_weave|W&B Weave]] — Alternative LLM observability platform * [[langfuse|Langfuse]] — Open-source LLM observability