====== Language Agent Tree Search ======
**Language Agent Tree Search (LATS)** is a general framework introduced by Zhou et al. (2023) that unifies reasoning, acting, and planning by combining **Monte Carlo Tree Search (MCTS)** with LLM capabilities. The LLM serves simultaneously as the policy (action generator), value function (state evaluator), and reflection mechanism, enabling systematic exploration of decision trees with environment feedback.
graph TD
A[Task Input] --> B[Select Node via UCT]
B --> C[LLM Generates Actions]
C --> D[Environment Feedback]
D --> E{Success?}
E -->|No| F[LLM Reflects on Failure]
F --> G[Store Reflection in Memory]
G --> H[Backpropagate Value]
H --> B
E -->|Yes| I[Return Solution Trajectory]
===== Motivation =====
Existing approaches treat reasoning (CoT, ToT) and acting (ReAct) as separate capabilities. Tree-of-Thoughts explores reasoning paths but lacks grounding in environment feedback. ReAct interacts with environments but follows a single trajectory without systematic exploration. LATS bridges this gap by embedding ReAct-style action trajectories within an MCTS search tree, combining the exploration benefits of tree search with the grounding of environment interaction.
===== MCTS Phases Adapted for LLMs =====
LATS adapts the four classical MCTS phases for language agents. Each state $s = [x, a_{1:i}, o_{1:i}]$ represents the input $x$, actions taken $a_{1:i}$, and observations received $o_{1:i}$.
=== Selection ===
Starting from the root $s_0$, traverse the tree to a leaf node using the Upper Confidence Bound for Trees (UCT):
$$\text{UCT}(s) = V(s) + c \sqrt{\frac{\ln N(\text{parent}(s))}{N(s)}}$$
where $V(s)$ is the estimated value, $N(s)$ is the visit count, and $c$ is the exploration constant balancing exploitation vs. exploration.
=== Expansion ===
At the selected leaf, sample $n$ new child nodes by prompting the LLM to generate actions (e.g., ReAct reasoning steps or tool calls), adding them to the tree.
=== Evaluation ===
Assign scalar values to new nodes via two complementary LLM-based heuristics:
* **Progress score** — LLM self-evaluates how close the current state is to task completion
* **Self-consistency score** — Agreement across similar trajectories in the tree
For failed trajectories, the LLM generates **reflections** stored in memory as semantic feedback for future in-context learning.
=== Backpropagation ===
Update values and visit counts along the path from the evaluated node back to the root:
$$V(s) \leftarrow \frac{1}{N(s)} \sum_{\text{visits}} v_i$$
class LATS:
def __init__(self, llm, env, n_expansions=5, max_iterations=50,
exploration_constant=1.0):
self.llm = llm
self.env = env
self.n = n_expansions
self.max_iter = max_iterations
self.c = exploration_constant
self.reflections = [] # Memory for failed trajectory reflections
def search(self, task):
root = Node(state=self.env.reset(task))
for _ in range(self.max_iter):
# Selection: traverse tree using UCT
node = self._select(root)
# Expansion: generate n child actions via LLM
children = self._expand(node)
# Evaluation: score children with LLM heuristics
for child in children:
obs = self.env.step(child.action)
child.observation = obs
if self.env.is_terminal(child):
if self.env.is_success(child):
return child.trajectory()
# Generate reflection on failure
reflection = self.llm.reflect(child.trajectory())
self.reflections.append(reflection)
child.value = self._evaluate(child)
# Backpropagation: update values up the tree
for child in children:
self._backpropagate(child)
return self._best_trajectory(root)
def _select(self, node):
while node.children:
node = max(node.children, key=lambda c: self._uct(c))
return node
def _uct(self, node):
if node.visits == 0:
return float('inf')
exploit = node.value / node.visits
explore = self.c * (math.log(node.parent.visits) / node.visits) ** 0.5
return exploit + explore
def _evaluate(self, node):
prompt = (
f"Trajectory so far:\n{node.trajectory()}\n"
f"Past reflections:\n{self.reflections[-3:]}\n"
f"Rate progress toward task completion (0-1):"
)
return float(self.llm.generate(prompt))
===== Key Architectural Innovation =====
The LLM plays three unified roles:
^ Role ^ Function ^ How ^
| **Policy** | Generate candidate actions | In-context prompting conditioned on trajectory |
| **Value Function** | Evaluate state quality | Self-assessment + cross-trajectory consistency |
| **Reflection** | Learn from failures | Generate verbal "gradients" stored in memory |
This eliminates the need for separate trained components, enabling gradient-free learning from experience through in-context reflection.
===== Key Results =====
LATS achieves state-of-the-art across diverse task types (GPT-4, no fine-tuning):
^ Benchmark ^ Task Type ^ LATS ^ Previous SOTA ^ Improvement ^
| HumanEval | Programming | 92.7% pass@1 | 86.5% (ToT) | +6.2% |
| WebShop | Web navigation | 75.9 avg score | Comparable to fine-tuned | Gradient-free |
| HotPotQA | Multi-hop QA | Best EM | RAP, ToT | Fewer nodes |
Efficiency gains are substantial: LATS uses 3.55x fewer nodes than RAP and 12.12x fewer nodes than ToT on HotPotQA.
===== Comparison with Related Methods =====
^ Method ^ Search ^ Environment Feedback ^ Reflection ^ Acting ^
| ToT | BFS/DFS | No | No | No |
| RAP | MCTS-like | No | No | No |
| ReAct | None (single path) | Yes | No | Yes |
| Reflexion | None (retry) | Yes | Yes | Yes |
| LATS | MCTS | Yes | Yes | Yes |
===== References =====
* [[https://arxiv.org/abs/2310.04406|Zhou et al. "Language Agent Tree Search Unifies Reasoning Acting and Planning in Language Models" (arXiv:2310.04406)]]
* [[https://arxiv.org/abs/2305.10601|Yao et al. "Tree of Thoughts" (arXiv:2305.10601)]]
* [[https://arxiv.org/abs/2210.03629|Yao et al. "ReAct: Synergizing Reasoning and Acting" (arXiv:2210.03629)]]
===== See Also =====
* [[tree_of_thoughts]]
* [[graph_of_thoughts]]
* [[chain_of_thought]]
* [[self_consistency]]