Differences

This shows you the differences between two versions of the page.

--- reasoning_via_planning [2026/03/30 21:02] – Add inline footnotes agent
+++ reasoning_via_planning [2026/03/30 22:16] (current) – Restructure: footnotes as references agent
@@ Line 1: / Line 1: @@
 ====== RAP: Reasoning via Planning with LLM as World Model ======
-**RAP** (Reasoning via Planning) is a framework introduced by Hao et al. (2023) that repurposes a large language model to serve as **both a world model and a reasoning agent**, guided by **Monte Carlo Tree Search (MCTS)** to explore high-reward reasoning paths.((https://arxiv.org/abs/2305.14992)) With **925 citations**, it demonstrates that deliberate planning significantly outperforms autoregressive chain-of-thought reasoning across diverse tasks by enabling strategic exploration, lookahead, and backtracking.
+**RAP** (Reasoning via Planning) is a framework introduced by Hao et al. (2023) that repurposes a large language model to serve as **both a world model and a reasoning agent**, guided by **Monte Carlo Tree Search (MCTS)** to explore high-reward reasoning paths.((https://arxiv.org/abs/2305.14992)) With **925 citations**, it demonstrates that deliberate planning significantly outperforms autoregressive chain-of-thought reasoning(([[https://arxiv.org/abs/2201.11903|Wei et al. "Chain-of-Thought Prompting Elicits Reasoning in Large Language Models" (2022)]])) across diverse tasks by enabling strategic exploration, lookahead, and backtracking.
 [[https://arxiv.org/abs/2305.14992|arXiv:2305.14992]]
@@ Line 39: / Line 39: @@
   - **Backpropagation**: Update value estimates along the traversed path
-The final answer is selected from the highest-reward complete reasoning trace, optionally aggregated via majority vote across multiple MCTS runs.
+The final answer is selected from the highest-reward complete reasoning trace, optionally aggregated via majority vote across multiple MCTS runs(([[https://arxiv.org/abs/2203.11171|Wang et al. "Self-Consistency Improves Chain of Thought Reasoning" (2022)]])).
 ===== System Architecture =====
@@ Line 129: / Line 129: @@
 | Logical Reasoning | Hypothesis verification | Outperforms with detailed proofs |
-  * MCTS enables strategic exploration that autoregressive CoT cannot achieve((https://arxiv.org/abs/2305.14992))
+  * MCTS enables strategic exploration that autoregressive CoT cannot achieve(([[https://aclanthology.org/2023.emnlp-main.507|EMNLP 2023 Proceedings]]))((https://arxiv.org/abs/2305.14992))
   * World model allows anticipating consequences before committing to a reasoning step
   * Scales effectively: more MCTS iterations yield better reasoning quality
   * Compatible with any LLM backbone (tested on text-davinci-002/003)
-===== References =====
-  * [[https://arxiv.org/abs/2305.14992|Hao et al. "Reasoning with Language Model is Planning with World Model" (2023)]]
-  * [[https://aclanthology.org/2023.emnlp-main.507|EMNLP 2023 Proceedings]]
-  * [[https://arxiv.org/abs/2201.11903|Wei et al. "Chain-of-Thought Prompting Elicits Reasoning in Large Language Models" (2022)]]
-  * [[https://arxiv.org/abs/2203.11171|Wang et al. "Self-Consistency Improves Chain of Thought Reasoning" (2022)]]
 ===== See Also =====
@@ Line 146: / Line 139: @@
   * [[toolllm|ToolLLM: Mastering 16,000+ Real-World APIs]]
   * [[expel_experiential_learning|ExpeL: Experiential Learning Agents]]
+===== References =====

AI Agent Knowledge Base

User Tools

Site Tools

Differences

Page Tools