====== Agent Planning: How AI Agents Plan and Reason ====== Agent planning is a critical capability that enables AI agents to devise efficient and effective solutions to complex, multi-step problems. AI agent planning encompasses the strategies and techniques that allow LLM-based agents to break down goals, sequence actions, and adapt on the fly. Rather than generating responses in a single pass, planning-capable agents decompose goals into sub-tasks, reason about action sequences, and adapt their strategies based on intermediate feedback. For information on how agents store and retrieve context across interactions, see [[memory|Memory Management in LLM Agents]]. graph TD TD[[[task_decomposition|Task Decomposition]]] --> CoT[Chain-of-Thought] TD --> ToT[[[tree_of_thoughts|Tree of Thoughts]]] TD --> GoT[[[graph_of_thoughts|Graph of Thoughts]]] TD --> LLMP[LLM+P Symbolic] CoT --> CoTD[Linear reasoning chain] ToT --> ToTD[Branching search with backtracking] GoT --> GoTD[Arbitrary graph with aggregation] LLMP --> LLMPD[LLM translates to PDDL, classical planner solves] style CoT fill:#e1f5fe style ToT fill:#fff3e0 style GoT fill:#e8f5e9 style LLMP fill:#f3e5f5 ===== Core Planning Techniques ===== ==== Chain-of-Thought (CoT) ==== Introduced by [[https://arxiv.org/abs/2201.11903|Wei et al., 2022]](([[https://arxiv.org/abs/2201.11903|Wei, J. et al. "Chain-of-Thought Prompting Elicits Reasoning in Large Language Models." arXiv:2201.11903, 2022.]])), [[chain_of_thought|CoT]] prompting elicits step-by-step reasoning.((https://arxiv.org/abs/2201.11903|Wei et al. "Chain-of-Thought Prompting Elicits Reasoning in Large Language Models.")) Variants include Zero-Shot CoT ("Let's think step by step"), Self-Consistency (majority voting over multiple reasoning paths), and Chain-of-Associated-Thoughts (CoAT, 2025) which integrates Monte Carlo Tree Search for exploring reasoning branches. See [[advanced_reasoning_planning|Advanced Reasoning and Planning]] for detailed coverage. ==== ReAct ==== [[react_framework|ReAct]] ([[https://arxiv.org/abs/2210.03629|Yao et al., 2022]](([[https://arxiv.org/abs/2210.03629|Yao, S. et al. "ReAct: Synergizing Reasoning and Acting in Language Models." arXiv:2210.03629, 2022.]])) combines **reasoning** and **acting** in an interleaved loop: the agent generates a thought (reasoning trace), takes an action (tool call), and observes the result.((https://arxiv.org/abs/2210.03629|Yao et al. "ReAct: Synergizing Reasoning and Acting in Language Models.")) This tight feedback loop enables dynamic replanning based on real-world outcomes. ReAct has become a standard pattern in frameworks like LangChain and [[llamaindex|LlamaIndex]]. ==== Tree of Thoughts (ToT) ==== [[tree_of_thoughts|ToT]] ([[https://arxiv.org/abs/2305.10601|Yao et al., 2023]](([[https://arxiv.org/abs/2305.10601|Yao, S. et al. "Tree of Thoughts: Deliberate Problem Solving with Large Language Models." arXiv:2305.10601, 2023.]])) explores multiple reasoning paths simultaneously using tree search (BFS/DFS).((https://arxiv.org/abs/2305.10601|Yao et al. "Tree of Thoughts: Deliberate Problem Solving with Large Language Models.")) Each intermediate thought is evaluated for promise, allowing the agent to backtrack from unproductive branches. Effective for tasks requiring exploration such as puzzle-solving and creative writing. ==== Graph of Thoughts (GoT) ==== GoT ([[https://arxiv.org/abs/2308.11114|Besta et al., 2024]](([[https://arxiv.org/abs/2308.11114|Besta, M. et al. "Graph of Thoughts: Solving Elaborate Problems with Large Language Models." arXiv:2308.11114, 2024.]])) generalizes planning to arbitrary directed graphs, enabling aggregation of partial solutions, refinement loops, and non-linear information flow.((https://arxiv.org/abs/2308.11114|Besta et al. "Graph of Thoughts: Solving Elaborate Problems with Large Language Models.")) A unified taxonomy by Besta et al. (2025) compares chains, trees, and graphs across cost-accuracy tradeoffs. ===== Modern Planning Approaches (2024-2025) ===== ==== LLM-Based Planners ==== Modern frontier models function as end-to-end planners: * **[[openai|OpenAI]] o3/o4-mini**: Use extended chain-of-thought with [[reinforcement_learning|reinforcement learning]] for inference-time compute scaling, enabling variable-depth planning * **Gemini 2.5 Pro**: Combines multimodal reasoning with structured tool chains for complex workflows * **[[deepseek|DeepSeek]]-R1**: Open-weight model using RL-trained reasoning for planning tasks, competitive with proprietary models on standard PDDL domains (Liu et al., 2025) A 2025 evaluation tested [[deepseek|DeepSeek]] R1, Gemini 2.5 Pro, and GPT-5 against the LAMA planner on International Planning Competition domains. GPT-5 was competitive on standard tasks, but all LLMs degraded significantly on obfuscated domains requiring pure logical reasoning. ==== Hybrid Neural-Symbolic Planning ==== Combining LLMs with classical planners addresses reliability gaps. See [[llm_with_planning|LLM+P]] for the full treatment. Key approaches: * **LLM+P** ([[https://arxiv.org/abs/2304.11477|Liu et al., 2023]](([[https://arxiv.org/abs/2304.11477|Liu, B. et al. "LLM+P: Empowering Large Language Models with Optimal Planning Proficiency." arXiv:2304.11477, 2023.]])): LLM translates natural language to PDDL; classical planner (Fast Downward) solves it((https://arxiv.org/abs/2304.11477|Liu et al. "LLM+P: Empowering Large Language Models with Optimal Planning Proficiency.")) * **LLM as Planning Formalizer** (Tantakoun, Muise, Zhu, 2025): LLMs construct and iteratively refine PDDL models for off-the-shelf planners * **MIT Optimization Integration** (2025): Teaching LLMs optimization algorithms for multi-step planning challenges ==== World Models ==== [[world_models|World models]] simulate environment dynamics, allowing agents to "imagine" action consequences before executing them: * **Voyager** ([[https://arxiv.org/abs/2305.16291|Wang et al., 2023]](([[https://arxiv.org/abs/2305.16291|Wang, G. et al. "Voyager: An Open-Ended Embodied Agent with Large Language Models." arXiv:2305.16291, 2023.]])): LLM-driven agent in Minecraft that builds a persistent skill library through world-model-guided exploration((https://arxiv.org/abs/2305.16291|Wang et al. "Voyager: An Open-Ended Embodied Agent with Large Language Models.")) * **DreamerV3** ([[https://arxiv.org/abs/2301.04104|Hafner et al., 2023]](([[https://arxiv.org/abs/2301.04104|Hafner, D. et al. "Mastering Diverse Domains through World Models." arXiv:2301.04104, 2023.]])): Learns world models from pixels for model-based RL planning * 2025 work integrates [[world_models|world models]] with inference-time scaling (extended CoT) for embodied planning in robotics and logistics ===== Embodied and Robotic Planning ===== LLM-based planning has extended to physical agents: * **SayCan** ([[https://arxiv.org/abs/2207.05916|Ahn et al., 2022]](([[https://arxiv.org/abs/2207.05916|Ahn, M. et al. "Do As I Can, Not As I Say: Grounding Language in Robotic Affordances." arXiv:2207.05916, 2022.]])): Combines LLM semantic knowledge with learned affordance functions that ground plans in physical capabilities((https://arxiv.org/abs/2207.05916|Ahn et al. "Do As I Can, Not As I Say: Grounding Language in Robotic Affordances.")) * **PaLM-E** ([[https://arxiv.org/abs/2303.03378|Driess et al., 2023]](([[https://arxiv.org/abs/2303.03378|Driess, D. et al. "PaLM-E: An Embodied Multimodal Language Model." arXiv:2303.03378, 2023.]])): Embodies a 562B parameter language model with visual and sensor inputs for multimodal robotic planning((https://arxiv.org/abs/2303.03378|Driess et al. "PaLM-E: An Embodied Multimodal Language Model.")) * **Inner Monologue** ([[https://arxiv.org/abs/2207.05608|Huang et al., 2022]](([[https://arxiv.org/abs/2207.05608|Huang, W. et al. "Inner Monologue: Embodied Reasoning through Planning with Language Models." arXiv:2207.05608, 2022.]])): Uses LLM self-talk to reason through embodied tasks step by step, incorporating environmental feedback((https://arxiv.org/abs/2207.05608|Huang et al. "Inner Monologue: Embodied Reasoning through Planning with Language Models.")) * **Code as Policies** ([[https://arxiv.org/abs/2209.07753|Liang et al., 2022]](([[https://arxiv.org/abs/2209.07753|Liang, J. et al. "Code as Policies: Language Model Programs for Embodied Control." arXiv:2209.07753, 2022.]])): LLMs generate executable robot policy code directly from natural language instructions((https://arxiv.org/abs/2209.07753|Liang et al. "Code as Policies: Language Model Programs for Embodied Control.")) ===== Code Example: LLM-Based Task Decomposition ===== from [[openai|openai]] import [[openai|OpenAI]] client = [[openai|OpenAI]]() DECOMPOSITION_PROMPT = """Break the following task into 3-7 concrete subtasks. Return as a numbered list. Each subtask should be independently actionable. Task: {task}""" def decompose_task(task: str) -> list[str]: response = client.chat.completions.create( model="gpt-4o", messages=[{"role": "user", "content": DECOMPOSITION_PROMPT.format(task=task)}], temperature=0.2, ) lines = response.choices[0].message.content.strip().split("\n") subtasks = [] for line in lines: cleaned = line.strip().lstrip("0123456789.)- ").strip() if cleaned: subtasks.append(cleaned) return subtasks def plan_and_execute(goal: str) -> dict: subtasks = decompose_task(goal) results = {} for i, subtask in enumerate(subtasks, 1): print(f"Step {i}: {subtask}") response = client.chat.completions.create( model="gpt-4o", messages=[ {"role": "system", "content": "Complete the following subtask concisely."}, {"role": "user", "content": subtask}, ], ) results[subtask] = response.choices[0].message.content return results results = plan_and_execute("Build a REST API for a todo app with authentication") for step, output in results.items(): print(f"\n--- {step} ---\n{output[:200]}") ===== Dynamic Replanning ===== Static plans often fail in complex environments. Modern agents implement: * **Iterative Refinement**: Executing a plan, observing outcomes, and revising subsequent steps * **Self-Refine** ([[https://arxiv.org/abs/2303.17651|Madaan et al., 2023]](([[https://arxiv.org/abs/2303.17651|Madaan, A. et al. "Self-Refine: Iterative Refinement with Self-Feedback." arXiv:2303.17651, 2023.]])): Using LLM self-feedback to improve plans before execution((https://arxiv.org/abs/2303.17651|Madaan et al. "Self-Refine: Iterative Refinement with Self-Feedback.")) * **Least-to-Most Prompting** ([[https://arxiv.org/abs/2210.11443|Zhou et al., 2022]](([[https://arxiv.org/abs/2210.11443|Zhou, D. et al. "Least-to-Most Prompting Enables Complex Reasoning in Large Language Models." arXiv:2210.11443, 2022.]])): Progressive decomposition from simple to complex sub-problems((https://arxiv.org/abs/2210.11443|Zhou et al. "Least-to-Most Prompting Enables Complex Reasoning in Large Language Models.")) * **RLVR ([[reinforcement_learning|Reinforcement Learning]] with Verifiable Rewards)**: Post-training technique enabling verifiable planning in math and code domains, predicted to expand to broader planning in 2026 ===== Benchmarks for Planning ===== * **PlanBench** ([[https://arxiv.org/abs/2305.10918|Valmeekam et al., 2023]](([[https://arxiv.org/abs/2305.10918|Valmeekam, K. et al. "PlanBench: An Extensible Benchmark for Evaluating Large Language Models on Planning and Reasoning about Change." arXiv:2305.10918, 2023.]])): Systematic evaluation of LLM planning on classical domains; 2025 updates show frontier models closing the gap to symbolic planners((https://arxiv.org/abs/2305.10918|Valmeekam et al. "PlanBench: An Extensible Benchmark for Evaluating Large Language Models on Planning and Reasoning about Change.")) * **TravelPlanner** ([[https://arxiv.org/abs/2403.12687|Xie et al., 2024]](([[https://arxiv.org/abs/2403.12687|Xie, J. et al. "TravelPlanner: A Benchmark for Real-World Planning with Language Agents." arXiv:2403.12687, 2024.]])): Complex real-world planning requiring constraint satisfaction across flights, hotels, and budgets((https://arxiv.org/abs/2403.12687|Xie et al. "TravelPlanner: A Benchmark for Real-World Planning with Language Agents.")) * **International Planning Competition (IPC)**: Standard PDDL domains used to compare LLMs against classical planners; obfuscated variants test pure reasoning * **WebArena** ([[https://arxiv.org/abs/2307.13854|Zhou et al., 2023]](([[https://arxiv.org/abs/2307.13854|Zhou, S. et al. "WebArena: A Realistic Web Environment for Building Autonomous Agents." arXiv:2307.13854, 2023.]])): Web navigation tasks requiring multi-step planning((https://arxiv.org/abs/2307.13854|Zhou et al. "WebArena: A Realistic Web Environment for Building Autonomous Agents.")) * **[[swe_bench|SWE-Bench]]** ([[https://arxiv.org/abs/2310.06770|Jimenez et al., 2024]](([[https://arxiv.org/abs/2310.06770|Jimenez, C. E. et al. "SWE-bench: Can Language Models Resolve Real-World GitHub Issues?" arXiv:2310.06770, 2024.]])): Software engineering tasks requiring planning across codebases((https://arxiv.org/abs/2310.06770|Jimenez et al. "SWE-bench: Can Language Models Resolve Real-World GitHub Issues?")) ===== Multi-Agent Planning ===== Complex tasks increasingly use coordinated multi-agent planning: * **MAPF (Multi-Agent Path Finding)**: AAAI 2025 work on coordinating multiple agents in shared environments * **Hierarchical Multi-Agent Workflows** (Liu et al., 2025, ICLR Workshop): Structured coordination for complex tasks * **[[meta|Meta]]-Prompt Optimization** (Kong et al., 2025, ICLR Workshop): Optimizing prompts for sequential multi-agent decisions ===== See Also ===== * [[advanced_reasoning_planning|Advanced Reasoning and Planning]] * [[multi_agent_architecture|Multi-Agent Architecture (Planner-Generator-Evaluator)]] * [[agentic_engineering|Agentic Engineering]] * [[how_to_create_an_agent|How to Create an Agent]] * [[ai_agents|AI Agents]] ===== References ===== * Wei, J. et al. "Chain-of-Thought Prompting Elicits Reasoning in Large Language Models." [[https://arxiv.org/abs/2201.11903|arXiv:2201.11903]], 2022. * Yao, S. et al. "ReAct: Synergizing Reasoning and Acting in Language Models." [[https://arxiv.org/abs/2210.03629|arXiv:2210.03629]], 2022. * Yao, S. et al. "[[tree_of_thoughts|Tree of Thoughts]]: Deliberate Problem Solving with Large Language Models." [[https://arxiv.org/abs/2305.10601|arXiv:2305.10601]], 2023. * Besta, M. et al. "[[graph_of_thoughts|Graph of Thoughts]]: Solving Elaborate Problems with Large Language Models." [[https://arxiv.org/abs/2308.11114|arXiv:2308.11114]], 2024. * Liu, B. et al. "LLM+P: Empowering Large Language Models with Optimal Planning Proficiency." [[https://arxiv.org/abs/2304.11477|arXiv:2304.11477]], 2023. * Wang, G. et al. "Voyager: An Open-Ended Embodied Agent with Large Language Models." [[https://arxiv.org/abs/2305.16291|arXiv:2305.16291]], 2023. * Hafner, D. et al. "Mastering Diverse Domains through [[world_models|World Models]]." [[https://arxiv.org/abs/2301.04104|arXiv:2301.04104]], 2023. * Ahn, M. et al. "Do As I Can, Not As I Say: Grounding Language in Robotic Affordances." [[https://arxiv.org/abs/2207.05916|arXiv:2207.05916]], 2022. * Driess, D. et al. "[[palm_e|PaLM-E: An Embodied Multimodal Language Model]]." [[https://arxiv.org/abs/2303.03378|arXiv:2303.03378]], 2023. * Huang, W. et al. "Inner Monologue: [[embodied_reasoning|Embodied Reasoning]] through Planning with Language Models." [[https://arxiv.org/abs/2207.05608|arXiv:2207.05608]], 2022. * Liang, J. et al. "Code as Policies: Language Model Programs for Embodied Control." [[https://arxiv.org/abs/2209.07753|arXiv:2209.07753]], 2022. * Madaan, A. et al. "[[self_refine|Self-Refine]]: Iterative Refinement with Self-Feedback." [[https://arxiv.org/abs/2303.17651|arXiv:2303.17651]], 2023. * Zhou, D. et al. "[[least_to_most_prompting|Least-to-Most Prompting]] Enables Complex Reasoning in Large Language Models." [[https://arxiv.org/abs/2210.11443|arXiv:2210.11443]], 2022. * Valmeekam, K. et al. "PlanBench: An Extensible Benchmark for Evaluating Large Language Models on Planning and Reasoning about Change." [[https://arxiv.org/abs/2305.10918|arXiv:2305.10918]], 2023. * Xie, J. et al. "TravelPlanner: A Benchmark for Real-World Planning with Language Agents." [[https://arxiv.org/abs/2403.12687|arXiv:2403.12687]], 2024. * Zhou, S. et al. "WebArena: A Realistic Web Environment for Building [[autonomous_agents|Autonomous Agents]]." [[https://arxiv.org/abs/2307.13854|arXiv:2307.13854]], 2023. * Jimenez, C. E. et al. "[[swe_bench|SWE-bench]]: Can Language Models Resolve Real-World GitHub Issues?" [[https://arxiv.org/abs/2310.06770|arXiv:2310.06770]], 2024.