====== Agent Planning: How AI Agents Plan and Reason ======

Agent planning is a critical capability that enables AI agents to devise efficient and effective solutions to complex, multi-step problems. AI agent planning encompasses the strategies and techniques that allow LLM-based agents to break down goals, sequence actions, and adapt on the fly. Rather than generating responses in a single pass, planning-capable agents decompose goals into sub-tasks, reason about action sequences, and adapt their strategies based on intermediate feedback.

For information on how agents store and retrieve context across interactions, see [[memory|Memory Management in LLM Agents]].

<mermaid>
graph TD
    TD[[[task_decomposition|Task Decomposition]]] --> CoT[Chain-of-Thought]
    TD --> ToT[[[tree_of_thoughts|Tree of Thoughts]]]
    TD --> GoT[[[graph_of_thoughts|Graph of Thoughts]]]
    TD --> LLMP[LLM+P Symbolic]

    CoT --> CoTD[Linear reasoning chain]
    ToT --> ToTD[Branching search with backtracking]
    GoT --> GoTD[Arbitrary graph with aggregation]
    LLMP --> LLMPD[LLM translates to PDDL, classical planner solves]

    style CoT fill:#e1f5fe
    style ToT fill:#fff3e0
    style GoT fill:#e8f5e9
    style LLMP fill:#f3e5f5
</mermaid>

===== Core Planning Techniques =====

==== Chain-of-Thought (CoT) ====

Introduced by [[https://arxiv.org/abs/2201.11903|Wei et al., 2022]](([[https://arxiv.org/abs/2201.11903|Wei, J. et al. "Chain-of-Thought Prompting Elicits Reasoning in Large Language Models." arXiv:2201.11903, 2022.]])), [[chain_of_thought|CoT]] prompting elicits step-by-step reasoning.((https://arxiv.org/abs/2201.11903|Wei et al. "Chain-of-Thought Prompting Elicits Reasoning in Large Language Models.")) Variants include Zero-Shot CoT ("Let's think step by step"), Self-Consistency (majority voting over multiple reasoning paths), and Chain-of-Associated-Thoughts (CoAT, 2025) which integrates Monte Carlo Tree Search for exploring reasoning branches. See [[advanced_reasoning_planning|Advanced Reasoning and Planning]] for detailed coverage.

==== ReAct ====

[[react_framework|ReAct]] ([[https://arxiv.org/abs/2210.03629|Yao et al., 2022]](([[https://arxiv.org/abs/2210.03629|Yao, S. et al. "ReAct: Synergizing Reasoning and Acting in Language Models." arXiv:2210.03629, 2022.]])) combines **reasoning** and **acting** in an interleaved loop: the agent generates a thought (reasoning trace), takes an action (tool call), and observes the result.((https://arxiv.org/abs/2210.03629|Yao et al. "ReAct: Synergizing Reasoning and Acting in Language Models.")) This tight feedback loop enables dynamic replanning based on real-world outcomes. ReAct has become a standard pattern in frameworks like LangChain and [[llamaindex|LlamaIndex]].

==== Tree of Thoughts (ToT) ====

[[tree_of_thoughts|ToT]] ([[https://arxiv.org/abs/2305.10601|Yao et al., 2023]](([[https://arxiv.org/abs/2305.10601|Yao, S. et al. "Tree of Thoughts: Deliberate Problem Solving with Large Language Models." arXiv:2305.10601, 2023.]])) explores multiple reasoning paths simultaneously using tree search (BFS/DFS).((https://arxiv.org/abs/2305.10601|Yao et al. "Tree of Thoughts: Deliberate Problem Solving with Large Language Models.")) Each intermediate thought is evaluated for promise, allowing the agent to backtrack from unproductive branches. Effective for tasks requiring exploration such as puzzle-solving and creative writing.

==== Graph of Thoughts (GoT) ====

GoT ([[https://arxiv.org/abs/2308.11114|Besta et al., 2024]](([[https://arxiv.org/abs/2308.11114|Besta, M. et al. "Graph of Thoughts: Solving Elaborate Problems with Large Language Models." arXiv:2308.11114, 2024.]])) generalizes planning to arbitrary directed graphs, enabling aggregation of partial solutions, refinement loops, and non-linear information flow.((https://arxiv.org/abs/2308.11114|Besta et al. "Graph of Thoughts: Solving Elaborate Problems with Large Language Models.")) A unified taxonomy by Besta et al. (2025) compares chains, trees, and graphs across cost-accuracy tradeoffs.

===== Modern Planning Approaches (2024-2025) =====

==== LLM-Based Planners ====

Modern frontier models function as end-to-end planners:

  * **[[openai|OpenAI]] o3/o4-mini**: Use extended chain-of-thought with [[reinforcement_learning|reinforcement learning]] for inference-time compute scaling, enabling variable-depth planning
  * **Gemini 2.5 Pro**: Combines multimodal reasoning with structured tool chains for complex workflows
  * **[[deepseek|DeepSeek]]-R1**: Open-weight model using RL-trained reasoning for planning tasks, competitive with proprietary models on standard PDDL domains (Liu et al., 2025)

A 2025 evaluation tested [[deepseek|DeepSeek]] R1, Gemini 2.5 Pro, and GPT-5 against the LAMA planner on International Planning Competition domains. GPT-5 was competitive on standard tasks, but all LLMs degraded significantly on obfuscated domains requiring pure logical reasoning.

==== Hybrid Neural-Symbolic Planning ====

Combining LLMs with classical planners addresses reliability gaps. See [[llm_with_planning|LLM+P]] for the full treatment. Key approaches:

  * **LLM+P** ([[https://arxiv.org/abs/2304.11477|Liu et al., 2023]](([[https://arxiv.org/abs/2304.11477|Liu, B. et al. "LLM+P: Empowering Large Language Models with Optimal Planning Proficiency." arXiv:2304.11477, 2023.]])): LLM translates natural language to PDDL; classical planner (Fast Downward) solves it((https://arxiv.org/abs/2304.11477|Liu et al. "LLM+P: Empowering Large Language Models with Optimal Planning Proficiency."))
  * **LLM as Planning Formalizer** (Tantakoun, Muise, Zhu, 2025): LLMs construct and iteratively refine PDDL models for off-the-shelf planners
  * **MIT Optimization Integration** (2025): Teaching LLMs optimization algorithms for multi-step planning challenges

==== World Models ====

[[world_models|World models]] simulate environment dynamics, allowing agents to "imagine" action consequences before executing them:

  * **Voyager** ([[https://arxiv.org/abs/2305.16291|Wang et al., 2023]](([[https://arxiv.org/abs/2305.16291|Wang, G. et al. "Voyager: An Open-Ended Embodied Agent with Large Language Models." arXiv:2305.16291, 2023.]])): LLM-driven agent in Minecraft that builds a persistent skill library through world-model-guided exploration((https://arxiv.org/abs/2305.16291|Wang et al. "Voyager: An Open-Ended Embodied Agent with Large Language Models."))
  * **DreamerV3** ([[https://arxiv.org/abs/2301.04104|Hafner et al., 2023]](([[https://arxiv.org/abs/2301.04104|Hafner, D. et al. "Mastering Diverse Domains through World Models." arXiv:2301.04104, 2023.]])): Learns world models from pixels for model-based RL planning
  * 2025 work integrates [[world_models|world models]] with inference-time scaling (extended CoT) for embodied planning in robotics and logistics

===== Embodied and Robotic Planning =====

LLM-based planning has extended to physical agents:

  * **SayCan** ([[https://arxiv.org/abs/2207.05916|Ahn et al., 2022]](([[https://arxiv.org/abs/2207.05916|Ahn, M. et al. "Do As I Can, Not As I Say: Grounding Language in Robotic Affordances." arXiv:2207.05916, 2022.]])): Combines LLM semantic knowledge with learned affordance functions that ground plans in physical capabilities((https://arxiv.org/abs/2207.05916|Ahn et al. "Do As I Can, Not As I Say: Grounding Language in Robotic Affordances."))
  * **PaLM-E** ([[https://arxiv.org/abs/2303.03378|Driess et al., 2023]](([[https://arxiv.org/abs/2303.03378|Driess, D. et al. "PaLM-E: An Embodied Multimodal Language Model." arXiv:2303.03378, 2023.]])): Embodies a 562B parameter language model with visual and sensor inputs for multimodal robotic planning((https://arxiv.org/abs/2303.03378|Driess et al. "PaLM-E: An Embodied Multimodal Language Model."))
  * **Inner Monologue** ([[https://arxiv.org/abs/2207.05608|Huang et al., 2022]](([[https://arxiv.org/abs/2207.05608|Huang, W. et al. "Inner Monologue: Embodied Reasoning through Planning with Language Models." arXiv:2207.05608, 2022.]])): Uses LLM self-talk to reason through embodied tasks step by step, incorporating environmental feedback((https://arxiv.org/abs/2207.05608|Huang et al. "Inner Monologue: Embodied Reasoning through Planning with Language Models."))
  * **Code as Policies** ([[https://arxiv.org/abs/2209.07753|Liang et al., 2022]](([[https://arxiv.org/abs/2209.07753|Liang, J. et al. "Code as Policies: Language Model Programs for Embodied Control." arXiv:2209.07753, 2022.]])): LLMs generate executable robot policy code directly from natural language instructions((https://arxiv.org/abs/2209.07753|Liang et al. "Code as Policies: Language Model Programs for Embodied Control."))

===== Code Example: LLM-Based Task Decomposition =====

<code python>
from [[openai|openai]] import [[openai|OpenAI]]

client = [[openai|OpenAI]]()

DECOMPOSITION_PROMPT = """Break the following task into 3-7 concrete subtasks.
Return as a numbered list. Each subtask should be independently actionable.

Task: {task}"""


def decompose_task(task: str) -> list[str]:
    response = client.chat.completions.create(
        model="gpt-4o",
        messages=[{"role": "user", "content": DECOMPOSITION_PROMPT.format(task=task)}],
        temperature=0.2,
    )
    lines = response.choices[0].message.content.strip().split("\n")
    subtasks = []
    for line in lines:
        cleaned = line.strip().lstrip("0123456789.)- ").strip()
        if cleaned:
            subtasks.append(cleaned)
    return subtasks


def plan_and_execute(goal: str) -> dict:
    subtasks = decompose_task(goal)
    results = {}
    for i, subtask in enumerate(subtasks, 1):
        print(f"Step {i}: {subtask}")
        response = client.chat.completions.create(
            model="gpt-4o",
            messages=[
                {"role": "system", "content": "Complete the following subtask concisely."},
                {"role": "user", "content": subtask},
            ],
        )
        results[subtask] = response.choices[0].message.content
    return results


results = plan_and_execute("Build a REST API for a todo app with authentication")
for step, output in results.items():
    print(f"\n--- {step} ---\n{output[:200]}")
</code>

===== Dynamic Replanning =====

Static plans often fail in complex environments. Modern agents implement:

  * **Iterative Refinement**: Executing a plan, observing outcomes, and revising subsequent steps
  * **Self-Refine** ([[https://arxiv.org/abs/2303.17651|Madaan et al., 2023]](([[https://arxiv.org/abs/2303.17651|Madaan, A. et al. "Self-Refine: Iterative Refinement with Self-Feedback." arXiv:2303.17651, 2023.]])): Using LLM self-feedback to improve plans before execution((https://arxiv.org/abs/2303.17651|Madaan et al. "Self-Refine: Iterative Refinement with Self-Feedback."))
  * **Least-to-Most Prompting** ([[https://arxiv.org/abs/2210.11443|Zhou et al., 2022]](([[https://arxiv.org/abs/2210.11443|Zhou, D. et al. "Least-to-Most Prompting Enables Complex Reasoning in Large Language Models." arXiv:2210.11443, 2022.]])): Progressive decomposition from simple to complex sub-problems((https://arxiv.org/abs/2210.11443|Zhou et al. "Least-to-Most Prompting Enables Complex Reasoning in Large Language Models."))
  * **RLVR ([[reinforcement_learning|Reinforcement Learning]] with Verifiable Rewards)**: Post-training technique enabling verifiable planning in math and code domains, predicted to expand to broader planning in 2026

===== Benchmarks for Planning =====

  * **PlanBench** ([[https://arxiv.org/abs/2305.10918|Valmeekam et al., 2023]](([[https://arxiv.org/abs/2305.10918|Valmeekam, K. et al. "PlanBench: An Extensible Benchmark for Evaluating Large Language Models on Planning and Reasoning about Change." arXiv:2305.10918, 2023.]])): Systematic evaluation of LLM planning on classical domains; 2025 updates show frontier models closing the gap to symbolic planners((https://arxiv.org/abs/2305.10918|Valmeekam et al. "PlanBench: An Extensible Benchmark for Evaluating Large Language Models on Planning and Reasoning about Change."))
  * **TravelPlanner** ([[https://arxiv.org/abs/2403.12687|Xie et al., 2024]](([[https://arxiv.org/abs/2403.12687|Xie, J. et al. "TravelPlanner: A Benchmark for Real-World Planning with Language Agents." arXiv:2403.12687, 2024.]])): Complex real-world planning requiring constraint satisfaction across flights, hotels, and budgets((https://arxiv.org/abs/2403.12687|Xie et al. "TravelPlanner: A Benchmark for Real-World Planning with Language Agents."))
  * **International Planning Competition (IPC)**: Standard PDDL domains used to compare LLMs against classical planners; obfuscated variants test pure reasoning
  * **WebArena** ([[https://arxiv.org/abs/2307.13854|Zhou et al., 2023]](([[https://arxiv.org/abs/2307.13854|Zhou, S. et al. "WebArena: A Realistic Web Environment for Building Autonomous Agents." arXiv:2307.13854, 2023.]])): Web navigation tasks requiring multi-step planning((https://arxiv.org/abs/2307.13854|Zhou et al. "WebArena: A Realistic Web Environment for Building Autonomous Agents."))
  * **[[swe_bench|SWE-Bench]]** ([[https://arxiv.org/abs/2310.06770|Jimenez et al., 2024]](([[https://arxiv.org/abs/2310.06770|Jimenez, C. E. et al. "SWE-bench: Can Language Models Resolve Real-World GitHub Issues?" arXiv:2310.06770, 2024.]])): Software engineering tasks requiring planning across codebases((https://arxiv.org/abs/2310.06770|Jimenez et al. "SWE-bench: Can Language Models Resolve Real-World GitHub Issues?"))

===== Multi-Agent Planning =====

Complex tasks increasingly use coordinated multi-agent planning:

  * **MAPF (Multi-Agent Path Finding)**: AAAI 2025 work on coordinating multiple agents in shared environments
  * **Hierarchical Multi-Agent Workflows** (Liu et al., 2025, ICLR Workshop): Structured coordination for complex tasks
  * **[[meta|Meta]]-Prompt Optimization** (Kong et al., 2025, ICLR Workshop): Optimizing prompts for sequential multi-agent decisions

===== See Also =====

  * [[advanced_reasoning_planning|Advanced Reasoning and Planning]]
  * [[multi_agent_architecture|Multi-Agent Architecture (Planner-Generator-Evaluator)]]
  * [[agentic_engineering|Agentic Engineering]]
  * [[how_to_create_an_agent|How to Create an Agent]]
  * [[ai_agents|AI Agents]]

===== References =====

  * Wei, J. et al. "Chain-of-Thought Prompting Elicits Reasoning in Large Language Models." [[https://arxiv.org/abs/2201.11903|arXiv:2201.11903]], 2022.
  * Yao, S. et al. "ReAct: Synergizing Reasoning and Acting in Language Models." [[https://arxiv.org/abs/2210.03629|arXiv:2210.03629]], 2022.
  * Yao, S. et al. "[[tree_of_thoughts|Tree of Thoughts]]: Deliberate Problem Solving with Large Language Models." [[https://arxiv.org/abs/2305.10601|arXiv:2305.10601]], 2023.
  * Besta, M. et al. "[[graph_of_thoughts|Graph of Thoughts]]: Solving Elaborate Problems with Large Language Models." [[https://arxiv.org/abs/2308.11114|arXiv:2308.11114]], 2024.
  * Liu, B. et al. "LLM+P: Empowering Large Language Models with Optimal Planning Proficiency." [[https://arxiv.org/abs/2304.11477|arXiv:2304.11477]], 2023.
  * Wang, G. et al. "Voyager: An Open-Ended Embodied Agent with Large Language Models." [[https://arxiv.org/abs/2305.16291|arXiv:2305.16291]], 2023.
  * Hafner, D. et al. "Mastering Diverse Domains through [[world_models|World Models]]." [[https://arxiv.org/abs/2301.04104|arXiv:2301.04104]], 2023.
  * Ahn, M. et al. "Do As I Can, Not As I Say: Grounding Language in Robotic Affordances." [[https://arxiv.org/abs/2207.05916|arXiv:2207.05916]], 2022.
  * Driess, D. et al. "[[palm_e|PaLM-E: An Embodied Multimodal Language Model]]." [[https://arxiv.org/abs/2303.03378|arXiv:2303.03378]], 2023.
  * Huang, W. et al. "Inner Monologue: [[embodied_reasoning|Embodied Reasoning]] through Planning with Language Models." [[https://arxiv.org/abs/2207.05608|arXiv:2207.05608]], 2022.
  * Liang, J. et al. "Code as Policies: Language Model Programs for Embodied Control." [[https://arxiv.org/abs/2209.07753|arXiv:2209.07753]], 2022.
  * Madaan, A. et al. "[[self_refine|Self-Refine]]: Iterative Refinement with Self-Feedback." [[https://arxiv.org/abs/2303.17651|arXiv:2303.17651]], 2023.
  * Zhou, D. et al. "[[least_to_most_prompting|Least-to-Most Prompting]] Enables Complex Reasoning in Large Language Models." [[https://arxiv.org/abs/2210.11443|arXiv:2210.11443]], 2022.
  * Valmeekam, K. et al. "PlanBench: An Extensible Benchmark for Evaluating Large Language Models on Planning and Reasoning about Change." [[https://arxiv.org/abs/2305.10918|arXiv:2305.10918]], 2023.
  * Xie, J. et al. "TravelPlanner: A Benchmark for Real-World Planning with Language Agents." [[https://arxiv.org/abs/2403.12687|arXiv:2403.12687]], 2024.
  * Zhou, S. et al. "WebArena: A Realistic Web Environment for Building [[autonomous_agents|Autonomous Agents]]." [[https://arxiv.org/abs/2307.13854|arXiv:2307.13854]], 2023.
  * Jimenez, C. E. et al. "[[swe_bench|SWE-bench]]: Can Language Models Resolve Real-World GitHub Issues?" [[https://arxiv.org/abs/2310.06770|arXiv:2310.06770]], 2024.