====== Advanced Reasoning and Planning ====== Advanced reasoning and planning encompasses the techniques and architectures that enable AI agents to break down complex problems, formulate multi-step strategies, and adapt their approach based on intermediate results. These capabilities are fundamental to building agents that can operate autonomously on open-ended tasks, moving beyond simple prompt-response interactions to exhibit goal-directed behavior. ===== Chain-of-Thought and Multi-Step Reasoning ===== **Chain-of-Thought (CoT)** prompting, introduced by [[https://arxiv.org/abs/2201.11903|Wei et al., 2022]], remains the foundational technique for eliciting step-by-step reasoning from LLMs.((Wei, J. et al. "Chain-of-Thought Prompting Elicits Reasoning in Large Language Models." [[https://arxiv.org/abs/2201.11903|arXiv:2201.11903]], 2022.)) By including intermediate reasoning steps in prompts, CoT dramatically improves performance on arithmetic, commonsense, and symbolic reasoning tasks. Key variants and extensions include: * **Zero-Shot CoT** ([[https://arxiv.org/abs/2205.11916|Kojima et al., 2022]]): Adding "Let's think step by step" triggers reasoning without exemplars((Kojima, T. et al. "Large Language Models are Zero-Shot Reasoners." [[https://arxiv.org/abs/2205.11916|arXiv:2205.11916]], 2022.)) * **Self-Consistency** ([[https://arxiv.org/abs/2203.11171|Wang et al., 2023]]): Samples multiple reasoning paths and selects the most consistent answer via majority voting((Wang, X. et al. "Self-Consistency Improves Chain of Thought Reasoning in Language Models." [[https://arxiv.org/abs/2203.11171|arXiv:2203.11171]], 2023.)) * **Chain-of-Associated-Thoughts (CoAT)** (Pan et al., 2025): Integrates Monte Carlo Tree Search with an association mechanism, enabling models to explore reasoning paths that combine both "fast" and "slow" thinking * **Chain-of-X Paradigms** (Xia et al., 2025, COLING): A survey documenting extensions beyond CoT including Chain-of-Verification, Chain-of-Knowledge, and Chain-of-Feedback Modern reasoning models like [[openai|OpenAI]] o3, [[deepseek|DeepSeek]]-R1, and [[claude|Claude]] 3.7 Sonnet use extended CoT with inference-time compute scaling, where additional computation at generation time yields deeper, more accurate reasoning. ===== Search-Based Planning Strategies ===== **Tree of Thoughts (ToT)**, introduced by [[https://arxiv.org/abs/2305.10601|Yao et al., 2023]], organizes reasoning into a tree structure where multiple reasoning paths are explored simultaneously via breadth-first or depth-first search.((Yao, S. et al. "Tree of Thoughts: Deliberate Problem Solving with Large Language Models." [[https://arxiv.org/abs/2305.10601|arXiv:2305.10601]], 2023.)) Each node represents an intermediate "thought" that is evaluated by the LLM for progress toward the goal. **Graph of Thoughts (GoT)**, proposed by [[https://arxiv.org/abs/2308.11114|Besta et al., 2024]], ETH Zurich, generalizes CoT and ToT by modeling reasoning as an arbitrary directed graph.((Besta, M. et al. "Graph of Thoughts: Solving Elaborate Problems with Large Language Models." [[https://arxiv.org/abs/2308.11114|arXiv:2308.11114]], 2024.)) This enables: * Aggregation of multiple partial solutions * Refinement loops where thoughts feed back into earlier stages * Non-linear information flow capturing complex dependencies **Matrix of Thought (MoT)** (Tang et al., 2025) re-evaluates the chain-vs-tree tradeoff and proposes structured matrices that capture both sequential and parallel reasoning dimensions. A comprehensive taxonomy by Besta et al. (2025) titled "Demystifying Chains, Trees, and Graphs of Thoughts" provides a unified framework comparing these topologies across efficiency, accuracy, and cost dimensions. ===== Hierarchical and Recursive Planning ===== For complex, long-horizon tasks, agents employ hierarchical decomposition: * **Least-to-Most Prompting** ([[https://arxiv.org/abs/2210.11443|Zhou et al., 2022]]): Decomposes problems into progressively simpler sub-problems, solving from easiest to hardest((Zhou, D. et al. "Least-to-Most Prompting Enables Complex Reasoning in Large Language Models." [[https://arxiv.org/abs/2210.11443|arXiv:2210.11443]], 2022.)) * **Decomposed Prompting (DecomP)** ([[https://arxiv.org/abs/2210.02406|Khot et al., 2023]]): Routes sub-tasks to specialized modules or tools((Khot, T. et al. "Decomposed Prompting: A Modular Approach for Solving Complex Tasks." [[https://arxiv.org/abs/2210.02406|arXiv:2210.02406]], 2023.)) * **Plan-and-Solve** ([[https://arxiv.org/abs/2305.04091|Wang et al., 2023]]): Explicitly generates a plan before executing each step((Wang, L. et al. "Plan-and-Solve Prompting: Improving Zero-Shot Chain-of-Thought Reasoning by Large Language Models." [[https://arxiv.org/abs/2305.04091|arXiv:2305.04091]], 2023.)) * **Self-Refine** ([[https://arxiv.org/abs/2303.17651|Madaan et al., 2023]]): Iteratively improves solutions using self-generated feedback((Madaan, A. et al. "Self-Refine: Iterative Refinement with Self-Feedback." [[https://arxiv.org/abs/2303.17651|arXiv:2303.17651]], 2023.)) Modern agents like [[openai|OpenAI]] Deep Research and [[anthropic|Anthropic]] [[claude|Claude]] use hierarchical planning to break hours-long research tasks into manageable sub-tasks, coordinating tool use, memory retrieval, and synthesis. ===== How Modern Agents Reason ===== As of 2025, frontier models employ distinct reasoning strategies: * **[[openai|OpenAI]] o3/o4-mini**: Dedicated reasoning models using extended chain-of-thought with [[reinforcement_learning|reinforcement learning]]; inference-time compute scaling allows variable reasoning depth * **[[claude|Claude]] 3.7 Sonnet**: [[extended_thinking|Extended thinking]] mode with persistent memory architectures (Harmony/Compass) for maintaining context across complex tasks * **Gemini 2.5 Pro**: Hybrid reasoning combining multi-[[modal|modal]] inputs with structured tool chains * **[[deepseek|DeepSeek]]-R1**: Open-weight reasoning model trained with [[reinforcement_learning|reinforcement learning]] to incentivize step-by-step verification ===== Evaluation and Benchmarks ===== Key benchmarks for evaluating reasoning and planning: * **GSM8K** ([[https://arxiv.org/abs/2110.14168|Cobbe et al., 2021]]): Grade school math word problems; frontier models now achieve >95% accuracy((Cobbe, K. et al. "Training Verifiers to Solve Math Word Problems." [[https://arxiv.org/abs/2110.14168|arXiv:2110.14168]], 2021.)) * **MATH** ([[https://arxiv.org/abs/2103.03874|Hendrycks et al., 2021]]): Competition-level mathematics; o3 and DeepSeek-R1 exceed 90%((Hendrycks, D. et al. "Measuring Mathematical Problem Solving With the MATH Dataset." [[https://arxiv.org/abs/2103.03874|arXiv:2103.03874]], 2021.)) * **ARC Challenge** (Clark et al., 2018): Science reasoning requiring world knowledge(([[https://arxiv.org/abs/1803.05457|Clark et al. - Think you have Solved Question Answering? Try ARC (2018]])) * **BIG-Bench Hard** ([[https://arxiv.org/abs/2206.05296|Suzgun et al., 2022]]): Subset of tasks where LLMs initially struggled((Suzgun, M. et al. "Challenging BIG-Bench Tasks and Whether Chain-of-Thought Can Solve Them." [[https://arxiv.org/abs/2206.05296|arXiv:2206.05296]], 2022.)) * **PlanBench** ([[https://arxiv.org/abs/2305.10918|Valmeekam et al., 2023]]): Evaluates planning capabilities using classical planning domains((Valmeekam, K. et al. "PlanBench: An Extensible Benchmark for Evaluating Large Language Models on Planning and Reasoning about Change." [[https://arxiv.org/abs/2305.10918|arXiv:2305.10918]], 2023.)) * **TravelPlanner** ([[https://arxiv.org/abs/2403.12687|Xie et al., 2024]]): Real-world planning requiring constraint satisfaction((Xie, J. et al. "TravelPlanner: A Benchmark for Real-World Planning with Language Agents." [[https://arxiv.org/abs/2403.12687|arXiv:2403.12687]], 2024.)) ===== See Also ===== * [[planning|Agent Planning: How AI Agents Plan and Reason]] * [[state_of_the_art_reasoning|State-of-the-Art Reasoning]] * [[multi_agent_architecture|Multi-Agent Architecture (Planner-Generator-Evaluator)]] * [[agentic_engineering|Agentic Engineering]] * [[chain_of_thought_agents|Chain of Thought Agents]] ===== References =====