Core Concepts
Reasoning
Memory & Retrieval
Agent Types
Design Patterns
Training & Alignment
Frameworks
Tools
Safety & Security
Evaluation
Meta
Core Concepts
Reasoning
Memory & Retrieval
Agent Types
Design Patterns
Training & Alignment
Frameworks
Tools
Safety & Security
Evaluation
Meta
Agent planning is a critical capability that enables AI agents to devise efficient and effective solutions to complex, multi-step problems. AI agent planning encompasses the strategies and techniques that allow LLM-based agents to break down goals, sequence actions, and adapt on the fly. Rather than generating responses in a single pass, planning-capable agents decompose goals into sub-tasks, reason about action sequences, and adapt their strategies based on intermediate feedback.
For information on how agents store and retrieve context across interactions, see Memory Management in LLM Agents.
Introduced by Wei et al., 2022, CoT prompting elicits step-by-step reasoning. Variants include Zero-Shot CoT (“Let's think step by step”), Self-Consistency (majority voting over multiple reasoning paths), and Chain-of-Associated-Thoughts (CoAT, 2025) which integrates Monte Carlo Tree Search for exploring reasoning branches. See Advanced Reasoning and Planning for detailed coverage.
ReAct (Yao et al., 2022) combines reasoning and acting in an interleaved loop: the agent generates a thought (reasoning trace), takes an action (tool call), and observes the result. This tight feedback loop enables dynamic replanning based on real-world outcomes. ReAct has become a standard pattern in frameworks like LangChain and LlamaIndex.
ToT (Yao et al., 2023) explores multiple reasoning paths simultaneously using tree search (BFS/DFS). Each intermediate thought is evaluated for promise, allowing the agent to backtrack from unproductive branches. Effective for tasks requiring exploration such as puzzle-solving and creative writing.
GoT (Besta et al., 2024) generalizes planning to arbitrary directed graphs, enabling aggregation of partial solutions, refinement loops, and non-linear information flow. A unified taxonomy by Besta et al. (2025) compares chains, trees, and graphs across cost-accuracy tradeoffs.
Modern frontier models function as end-to-end planners:
A 2025 evaluation tested DeepSeek R1, Gemini 2.5 Pro, and GPT-5 against the LAMA planner on International Planning Competition domains. GPT-5 was competitive on standard tasks, but all LLMs degraded significantly on obfuscated domains requiring pure logical reasoning.
Combining LLMs with classical planners addresses reliability gaps. See LLM+P for the full treatment. Key approaches:
World models simulate environment dynamics, allowing agents to “imagine” action consequences before executing them:
LLM-based planning has extended to physical agents:
from openai import OpenAI client = OpenAI() DECOMPOSITION_PROMPT = """Break the following task into 3-7 concrete subtasks. Return as a numbered list. Each subtask should be independently actionable. Task: {task}""" def decompose_task(task: str) -> list[str]: """Use an LLM to decompose a high-level task into ordered subtasks.""" response = client.chat.completions.create( model="gpt-4o", messages=[{"role": "user", "content": DECOMPOSITION_PROMPT.format(task=task)}], temperature=0.2, ) lines = response.choices[0].message.content.strip().split("\n") subtasks = [] for line in lines: cleaned = line.strip().lstrip("0123456789.)- ").strip() if cleaned: subtasks.append(cleaned) return subtasks def plan_and_execute(goal: str) -> dict: """Decompose a goal into subtasks, then execute each sequentially.""" subtasks = decompose_task(goal) results = {} for i, subtask in enumerate(subtasks, 1): print(f"Step {i}: {subtask}") response = client.chat.completions.create( model="gpt-4o", messages=[ {"role": "system", "content": "Complete the following subtask concisely."}, {"role": "user", "content": subtask}, ], ) results[subtask] = response.choices[0].message.content return results results = plan_and_execute("Build a REST API for a todo app with authentication") for step, output in results.items(): print(f"\n--- {step} ---\n{output[:200]}")
Static plans often fail in complex environments. Modern agents implement:
Complex tasks increasingly use coordinated multi-agent planning: