Table of Contents

Common Agent Failure Modes

A systematic catalog of how LLM-based agents fail in production. For each failure mode: symptoms, root causes, and actionable fixes. Based on real incident reports from 2024-2026 including the Kiro AWS outage, Claude Code infinite loop bugs, and enterprise deployment statistics.

The Reality of Agent Failures

Agent failures are fundamentally different from traditional software bugs. Traditional software fails predictably (null pointers, timeouts). Agents fail probabilistically — the same input can succeed 9 times and fail catastrophically on the 10th.

Production statistics (2025-2026):

Failure Mode Catalog

1. Reasoning Failures

Symptoms:

Root Causes:

Fixes:

2. Tool Use Errors

Symptoms:

Root Causes:

Fixes:

import json
from typing import Any
 
class SafeToolExecutor:
    """Validate and execute tool calls with error handling."""
 
    def __init__(self, tools: dict):
        self.tools = tools
        self.max_retries = 3
        self.call_log = []
 
    def execute(self, tool_name: str, params: dict) -> dict:
        # Validate tool exists
        if tool_name not in self.tools:
            return {"error": f"Tool '{tool_name}' not found. Available: {list(self.tools.keys())}"}
 
        tool = self.tools[tool_name]
 
        # Validate parameters against schema
        required = tool.get("required_params", [])
        missing = [p for p in required if p not in params]
        if missing:
            return {"error": f"Missing required params: {missing}"}
 
        # Execute with retry and timeout
        for attempt in range(self.max_retries):
            try:
                result = tool["function"](**params)
                self.call_log.append({
                    "tool": tool_name, "params": params,
                    "result": "success", "attempt": attempt + 1
                })
                return {"result": result}
            except Exception as e:
                if attempt == self.max_retries - 1:
                    self.call_log.append({
                        "tool": tool_name, "params": params,
                        "result": f"failed: {e}", "attempt": attempt + 1
                    })
                    return {"error": str(e)}
        return {"error": "Max retries exceeded"}
 
    def get_cost_report(self) -> dict:
        return {
            "total_calls": len(self.call_log),
            "failures": sum(1 for c in self.call_log if "failed" in c["result"]),
            "tools_used": list(set(c["tool"] for c in self.call_log)),
        }

3. Context Overflow

Symptoms:

Root Causes:

Fixes:

4. Infinite Loops

Real incident: A Claude Code sub-agent ran npm install 300+ times over 4.6 hours, consuming 27M tokens at 128K context per iteration. A LangGraph agent processed 2,847 iterations at $400+ cost for a $5 task4).

Symptoms:

Root Causes:

Fixes:

import time
import hashlib
 
class LoopDetector:
    """Detect and prevent infinite loops in agent execution."""
 
    def __init__(self, max_iterations: int = 50, max_cost_usd: float = 10.0):
        self.max_iterations = max_iterations
        self.max_cost_usd = max_cost_usd
        self.iteration = 0
        self.total_tokens = 0
        self.action_hashes = []
        self.cost_per_1k_tokens = 0.01  # Adjust per model
 
    def check(self, action: str, params: dict, tokens_used: int) -> dict:
        self.iteration += 1
        self.total_tokens += tokens_used
        estimated_cost = (self.total_tokens / 1000) * self.cost_per_1k_tokens
 
        # Check iteration limit
        if self.iteration > self.max_iterations:
            return {"halt": True, "reason": f"Max iterations ({self.max_iterations}) exceeded"}
 
        # Check cost limit
        if estimated_cost > self.max_cost_usd:
            return {"halt": True, "reason": f"Cost limit (${self.max_cost_usd}) exceeded: ${estimated_cost:.2f}"}
 
        # Check for repeated actions (same action + params = loop)
        action_hash = hashlib.md5(f"{action}{params}".encode()).hexdigest()
        recent_hashes = self.action_hashes[-10:]  # Check last 10
        repeat_count = recent_hashes.count(action_hash)
        self.action_hashes.append(action_hash)
 
        if repeat_count >= 3:
            return {"halt": True, "reason": f"Action '{action}' repeated {repeat_count}x with same params"}
 
        return {"halt": False, "iteration": self.iteration, "cost": f"${estimated_cost:.2f}"}

5. Goal Drift

Symptoms:

Root Causes:

Fixes:

6. Prompt Injection

Symptoms:

Root Causes:

Fixes:

7. Hallucination

See Why Is My Agent Hallucinating? for the dedicated guide.

Quick summary: Agent generates plausible but wrong information. Fix with RAG grounding, chain-of-verification, low temperature, and constrained decoding.

8. Cost Runaway

Symptoms:

Root Causes:

Fixes:

class CostGuard:
    """Monitor and limit agent API costs in real-time."""
 
    PRICING = {  # USD per 1M tokens (input/output)
        "gpt-4o": {"input": 2.50, "output": 10.00},
        "gpt-4o-mini": {"input": 0.15, "output": 0.60},
        "claude-sonnet-4": {"input": 3.00, "output": 15.00},
        "claude-haiku-3.5": {"input": 0.80, "output": 4.00},
    }
 
    def __init__(self, budget_usd: float = 5.0):
        self.budget = budget_usd
        self.total_cost = 0.0
        self.calls = []
 
    def track(self, model: str, input_tokens: int, output_tokens: int) -> dict:
        pricing = self.PRICING.get(model, {"input": 5.0, "output": 15.0})
        cost = (input_tokens * pricing["input"] + output_tokens * pricing["output"]) / 1_000_000
        self.total_cost += cost
        self.calls.append({"model": model, "cost": cost})
 
        if self.total_cost > self.budget:
            return {"allowed": False, "reason": f"Budget exceeded: ${self.total_cost:.4f} / ${self.budget}"}
        return {"allowed": True, "total_cost": f"${self.total_cost:.4f}", "remaining": f"${self.budget - self.total_cost:.4f}"}
 
    def recommend_model(self, task_complexity: str) -> str:
        """Route to cheapest sufficient model."""
        routing = {
            "simple": "gpt-4o-mini",      # Classification, extraction, formatting
            "moderate": "claude-haiku-3.5", # Summarization, Q&A
            "complex": "gpt-4o",           # Multi-step reasoning, code generation
            "critical": "claude-sonnet-4",  # High-stakes decisions
        }
        return routing.get(task_complexity, "gpt-4o-mini")

Failure Mode Decision Diagram

graph TD A[Agent Misbehaving] --> B{What type of failure?} B --> C[Wrong output] B --> D[Stuck/looping] B --> E[Unexpected behavior] B --> F[Cost explosion] C --> C1{Is output fabricated?} C1 -->|Yes| C2[Hallucination - see dedicated guide] C1 -->|No| C3{Is reasoning wrong?} C3 -->|Yes| C4[Add chain-of-thought + verification] C3 -->|No| C5[Tool misuse - fix tool descriptions] D --> D1{Same action repeating?} D1 -->|Yes| D2[Infinite loop - add loop detector] D1 -->|No| D3{Agent oscillating?} D3 -->|Yes| D4[Circular dependency - break cycle] D3 -->|No| D5[Context overflow - add summarization] E --> E1{After processing external input?} E1 -->|Yes| E2[Prompt injection - sanitize inputs] E1 -->|No| E3{Doing unrelated tasks?} E3 -->|Yes| E4[Goal drift - re-anchor to objective] E3 -->|No| E5[Check system prompt and tool config] F --> F1[Add CostGuard + model routing] F --> F2[Add iteration limits] F --> F3[Compress tool outputs]

Production Safety Checklist

See Also

References