Core Concepts
Reasoning
Memory & Retrieval
Agent Types
Design Patterns
Training & Alignment
Frameworks
Tools
Safety & Security
Evaluation
Meta
Core Concepts
Reasoning
Memory & Retrieval
Agent Types
Design Patterns
Training & Alignment
Frameworks
Tools
Safety & Security
Evaluation
Meta
A systematic catalog of how LLM-based agents fail in production. For each failure mode: symptoms, root causes, and actionable fixes. Based on real incident reports from 2024-2026 including the Kiro AWS outage, Claude Code infinite loop bugs, and enterprise deployment statistics.
Agent failures are fundamentally different from traditional software bugs. Traditional software fails predictably (null pointers, timeouts). Agents fail probabilistically — the same input can succeed 9 times and fail catastrophically on the 10th.
Production statistics (2025-2026):
Symptoms:
Root Causes:
Fixes:
Symptoms:
Root Causes:
Fixes:
import json from typing import Any class SafeToolExecutor: """Validate and execute tool calls with error handling.""" def __init__(self, tools: dict): self.tools = tools self.max_retries = 3 self.call_log = [] def execute(self, tool_name: str, params: dict) -> dict: # Validate tool exists if tool_name not in self.tools: return {"error": f"Tool '{tool_name}' not found. Available: {list(self.tools.keys())}"} tool = self.tools[tool_name] # Validate parameters against schema required = tool.get("required_params", []) missing = [p for p in required if p not in params] if missing: return {"error": f"Missing required params: {missing}"} # Execute with retry and timeout for attempt in range(self.max_retries): try: result = tool["function"](**params) self.call_log.append({ "tool": tool_name, "params": params, "result": "success", "attempt": attempt + 1 }) return {"result": result} except Exception as e: if attempt == self.max_retries - 1: self.call_log.append({ "tool": tool_name, "params": params, "result": f"failed: {e}", "attempt": attempt + 1 }) return {"error": str(e)} return {"error": "Max retries exceeded"} def get_cost_report(self) -> dict: return { "total_calls": len(self.call_log), "failures": sum(1 for c in self.call_log if "failed" in c["result"]), "tools_used": list(set(c["tool"] for c in self.call_log)), }
Symptoms:
Root Causes:
Fixes:
Real incident: A Claude Code sub-agent ran npm install 300+ times over 4.6 hours, consuming 27M tokens at 128K context per iteration. A LangGraph agent processed 2,847 iterations at $400+ cost for a $5 task4).
Symptoms:
Root Causes:
Fixes:
import time import hashlib class LoopDetector: """Detect and prevent infinite loops in agent execution.""" def __init__(self, max_iterations: int = 50, max_cost_usd: float = 10.0): self.max_iterations = max_iterations self.max_cost_usd = max_cost_usd self.iteration = 0 self.total_tokens = 0 self.action_hashes = [] self.cost_per_1k_tokens = 0.01 # Adjust per model def check(self, action: str, params: dict, tokens_used: int) -> dict: self.iteration += 1 self.total_tokens += tokens_used estimated_cost = (self.total_tokens / 1000) * self.cost_per_1k_tokens # Check iteration limit if self.iteration > self.max_iterations: return {"halt": True, "reason": f"Max iterations ({self.max_iterations}) exceeded"} # Check cost limit if estimated_cost > self.max_cost_usd: return {"halt": True, "reason": f"Cost limit (${self.max_cost_usd}) exceeded: ${estimated_cost:.2f}"} # Check for repeated actions (same action + params = loop) action_hash = hashlib.md5(f"{action}{params}".encode()).hexdigest() recent_hashes = self.action_hashes[-10:] # Check last 10 repeat_count = recent_hashes.count(action_hash) self.action_hashes.append(action_hash) if repeat_count >= 3: return {"halt": True, "reason": f"Action '{action}' repeated {repeat_count}x with same params"} return {"halt": False, "iteration": self.iteration, "cost": f"${estimated_cost:.2f}"}
Symptoms:
Root Causes:
Fixes:
Symptoms:
Root Causes:
Fixes:
See Why Is My Agent Hallucinating? for the dedicated guide.
Quick summary: Agent generates plausible but wrong information. Fix with RAG grounding, chain-of-verification, low temperature, and constrained decoding.
Symptoms:
Root Causes:
Fixes:
class CostGuard: """Monitor and limit agent API costs in real-time.""" PRICING = { # USD per 1M tokens (input/output) "gpt-4o": {"input": 2.50, "output": 10.00}, "gpt-4o-mini": {"input": 0.15, "output": 0.60}, "claude-sonnet-4": {"input": 3.00, "output": 15.00}, "claude-haiku-3.5": {"input": 0.80, "output": 4.00}, } def __init__(self, budget_usd: float = 5.0): self.budget = budget_usd self.total_cost = 0.0 self.calls = [] def track(self, model: str, input_tokens: int, output_tokens: int) -> dict: pricing = self.PRICING.get(model, {"input": 5.0, "output": 15.0}) cost = (input_tokens * pricing["input"] + output_tokens * pricing["output"]) / 1_000_000 self.total_cost += cost self.calls.append({"model": model, "cost": cost}) if self.total_cost > self.budget: return {"allowed": False, "reason": f"Budget exceeded: ${self.total_cost:.4f} / ${self.budget}"} return {"allowed": True, "total_cost": f"${self.total_cost:.4f}", "remaining": f"${self.budget - self.total_cost:.4f}"} def recommend_model(self, task_complexity: str) -> str: """Route to cheapest sufficient model.""" routing = { "simple": "gpt-4o-mini", # Classification, extraction, formatting "moderate": "claude-haiku-3.5", # Summarization, Q&A "complex": "gpt-4o", # Multi-step reasoning, code generation "critical": "claude-sonnet-4", # High-stakes decisions } return routing.get(task_complexity, "gpt-4o-mini")