This shows you the differences between two versions of the page.
| Both sides previous revisionPrevious revisionNext revision | Previous revision | ||
| reasoning_via_planning [2026/03/30 21:02] – Add inline footnotes agent | reasoning_via_planning [2026/03/30 22:16] (current) – Restructure: footnotes as references agent | ||
|---|---|---|---|
| Line 1: | Line 1: | ||
| ====== RAP: Reasoning via Planning with LLM as World Model ====== | ====== RAP: Reasoning via Planning with LLM as World Model ====== | ||
| - | **RAP** (Reasoning via Planning) is a framework introduced by Hao et al. (2023) that repurposes a large language model to serve as **both a world model and a reasoning agent**, guided by **Monte Carlo Tree Search (MCTS)** to explore high-reward reasoning paths.((https:// | + | **RAP** (Reasoning via Planning) is a framework introduced by Hao et al. (2023) that repurposes a large language model to serve as **both a world model and a reasoning agent**, guided by **Monte Carlo Tree Search (MCTS)** to explore high-reward reasoning paths.((https:// |
| [[https:// | [[https:// | ||
| Line 39: | Line 39: | ||
| - **Backpropagation**: | - **Backpropagation**: | ||
| - | The final answer is selected from the highest-reward complete reasoning trace, optionally aggregated via majority vote across multiple MCTS runs. | + | The final answer is selected from the highest-reward complete reasoning trace, optionally aggregated via majority vote across multiple MCTS runs(([[https:// |
| ===== System Architecture ===== | ===== System Architecture ===== | ||
| Line 129: | Line 129: | ||
| | Logical Reasoning | Hypothesis verification | Outperforms with detailed proofs | | | Logical Reasoning | Hypothesis verification | Outperforms with detailed proofs | | ||
| - | * MCTS enables strategic exploration that autoregressive CoT cannot achieve((https:// | + | * MCTS enables strategic exploration that autoregressive CoT cannot achieve(([[https:// |
| * World model allows anticipating consequences before committing to a reasoning step | * World model allows anticipating consequences before committing to a reasoning step | ||
| * Scales effectively: | * Scales effectively: | ||
| * Compatible with any LLM backbone (tested on text-davinci-002/ | * Compatible with any LLM backbone (tested on text-davinci-002/ | ||
| - | |||
| - | ===== References ===== | ||
| - | |||
| - | * [[https:// | ||
| - | * [[https:// | ||
| - | * [[https:// | ||
| - | * [[https:// | ||
| ===== See Also ===== | ===== See Also ===== | ||
| Line 146: | Line 139: | ||
| * [[toolllm|ToolLLM: | * [[toolllm|ToolLLM: | ||
| * [[expel_experiential_learning|ExpeL: | * [[expel_experiential_learning|ExpeL: | ||
| + | |||
| + | ===== References ===== | ||