Table of Contents

Agent Planning: How AI Agents Plan and Reason

Agent planning is a critical capability that enables AI agents to devise efficient and effective solutions to complex, multi-step problems. AI agent planning encompasses the strategies and techniques that allow LLM-based agents to break down goals, sequence actions, and adapt on the fly. Rather than generating responses in a single pass, planning-capable agents decompose goals into sub-tasks, reason about action sequences, and adapt their strategies based on intermediate feedback.

For information on how agents store and retrieve context across interactions, see Memory Management in LLM Agents.

graph TD TD[[[task_decomposition|Task Decomposition]]] --> CoT[Chain-of-Thought] TD --> ToT[[[tree_of_thoughts|Tree of Thoughts]]] TD --> GoT[[[graph_of_thoughts|Graph of Thoughts]]] TD --> LLMP[LLM+P Symbolic] CoT --> CoTD[Linear reasoning chain] ToT --> ToTD[Branching search with backtracking] GoT --> GoTD[Arbitrary graph with aggregation] LLMP --> LLMPD[LLM translates to PDDL, classical planner solves] style CoT fill:#e1f5fe style ToT fill:#fff3e0 style GoT fill:#e8f5e9 style LLMP fill:#f3e5f5

Core Planning Techniques

Chain-of-Thought (CoT)

Introduced by Wei et al., 20221), CoT prompting elicits step-by-step reasoning.2) Variants include Zero-Shot CoT (“Let's think step by step”), Self-Consistency (majority voting over multiple reasoning paths), and Chain-of-Associated-Thoughts (CoAT, 2025) which integrates Monte Carlo Tree Search for exploring reasoning branches. See Advanced Reasoning and Planning for detailed coverage.

ReAct

ReAct (Yao et al., 20223) combines reasoning and acting in an interleaved loop: the agent generates a thought (reasoning trace), takes an action (tool call), and observes the result.4) This tight feedback loop enables dynamic replanning based on real-world outcomes. ReAct has become a standard pattern in frameworks like LangChain and LlamaIndex.

Tree of Thoughts (ToT)

ToT (Yao et al., 20235) explores multiple reasoning paths simultaneously using tree search (BFS/DFS).6) Each intermediate thought is evaluated for promise, allowing the agent to backtrack from unproductive branches. Effective for tasks requiring exploration such as puzzle-solving and creative writing.

Graph of Thoughts (GoT)

GoT (Besta et al., 20247) generalizes planning to arbitrary directed graphs, enabling aggregation of partial solutions, refinement loops, and non-linear information flow.8) A unified taxonomy by Besta et al. (2025) compares chains, trees, and graphs across cost-accuracy tradeoffs.

Modern Planning Approaches (2024-2025)

LLM-Based Planners

Modern frontier models function as end-to-end planners:

A 2025 evaluation tested DeepSeek R1, Gemini 2.5 Pro, and GPT-5 against the LAMA planner on International Planning Competition domains. GPT-5 was competitive on standard tasks, but all LLMs degraded significantly on obfuscated domains requiring pure logical reasoning.

Hybrid Neural-Symbolic Planning

Combining LLMs with classical planners addresses reliability gaps. See LLM+P for the full treatment. Key approaches:

World Models

World models simulate environment dynamics, allowing agents to “imagine” action consequences before executing them:

Embodied and Robotic Planning

LLM-based planning has extended to physical agents:

Code Example: LLM-Based Task Decomposition

from [[openai|openai]] import [[openai|OpenAI]]
 
client = [[openai|OpenAI]]()
 
DECOMPOSITION_PROMPT = """Break the following task into 3-7 concrete subtasks.
Return as a numbered list. Each subtask should be independently actionable.
 
Task: {task}"""
 
 
def decompose_task(task: str) -> list[str]:
    response = client.chat.completions.create(
        model="gpt-4o",
        messages=[{"role": "user", "content": DECOMPOSITION_PROMPT.format(task=task)}],
        temperature=0.2,
    )
    lines = response.choices[0].message.content.strip().split("\n")
    subtasks = []
    for line in lines:
        cleaned = line.strip().lstrip("0123456789.)- ").strip()
        if cleaned:
            subtasks.append(cleaned)
    return subtasks
 
 
def plan_and_execute(goal: str) -> dict:
    subtasks = decompose_task(goal)
    results = {}
    for i, subtask in enumerate(subtasks, 1):
        print(f"Step {i}: {subtask}")
        response = client.chat.completions.create(
            model="gpt-4o",
            messages=[
                {"role": "system", "content": "Complete the following subtask concisely."},
                {"role": "user", "content": subtask},
            ],
        )
        results[subtask] = response.choices[0].message.content
    return results
 
 
results = plan_and_execute("Build a REST API for a todo app with authentication")
for step, output in results.items():
    print(f"\n--- {step} ---\n{output[:200]}")

Dynamic Replanning

Static plans often fail in complex environments. Modern agents implement:

Benchmarks for Planning

Multi-Agent Planning

Complex tasks increasingly use coordinated multi-agent planning:

See Also

References

2)
https://arxiv.org/abs/2201.11903|Wei et al. “Chain-of-Thought Prompting Elicits Reasoning in Large Language Models.”
4)
https://arxiv.org/abs/2210.03629|Yao et al. “ReAct: Synergizing Reasoning and Acting in Language Models.”
6)
https://arxiv.org/abs/2305.10601|Yao et al. “Tree of Thoughts: Deliberate Problem Solving with Large Language Models.”
8)
https://arxiv.org/abs/2308.11114|Besta et al. “Graph of Thoughts: Solving Elaborate Problems with Large Language Models.”
10)
https://arxiv.org/abs/2304.11477|Liu et al. “LLM+P: Empowering Large Language Models with Optimal Planning Proficiency.”
12)
https://arxiv.org/abs/2305.16291|Wang et al. “Voyager: An Open-Ended Embodied Agent with Large Language Models.”
15)
https://arxiv.org/abs/2207.05916|Ahn et al. “Do As I Can, Not As I Say: Grounding Language in Robotic Affordances.”
17)
https://arxiv.org/abs/2303.03378|Driess et al. “PaLM-E: An Embodied Multimodal Language Model.”
19)
https://arxiv.org/abs/2207.05608|Huang et al. “Inner Monologue: Embodied Reasoning through Planning with Language Models.”
21)
https://arxiv.org/abs/2209.07753|Liang et al. “Code as Policies: Language Model Programs for Embodied Control.”
23)
https://arxiv.org/abs/2303.17651|Madaan et al. “Self-Refine: Iterative Refinement with Self-Feedback.”
25)
https://arxiv.org/abs/2210.11443|Zhou et al. “Least-to-Most Prompting Enables Complex Reasoning in Large Language Models.”
27)
https://arxiv.org/abs/2305.10918|Valmeekam et al. “PlanBench: An Extensible Benchmark for Evaluating Large Language Models on Planning and Reasoning about Change.”
29)
https://arxiv.org/abs/2403.12687|Xie et al. “TravelPlanner: A Benchmark for Real-World Planning with Language Agents.”
31)
https://arxiv.org/abs/2307.13854|Zhou et al. “WebArena: A Realistic Web Environment for Building Autonomous Agents.”
33)
https://arxiv.org/abs/2310.06770|Jimenez et al. “SWE-bench: Can Language Models Resolve Real-World GitHub Issues?”