====== Language Agent Tree Search ====== **Language Agent Tree Search (LATS)** is a general framework introduced by Zhou et al. (2023) that unifies reasoning, acting, and planning by combining **Monte Carlo Tree Search (MCTS)** with LLM capabilities. The LLM serves simultaneously as the policy (action generator), value function (state evaluator), and reflection mechanism, enabling systematic exploration of decision trees with environment feedback. graph TD A[Task Input] --> B[Select Node via UCT] B --> C[LLM Generates Actions] C --> D[Environment Feedback] D --> E{Success?} E -->|No| F[LLM Reflects on Failure] F --> G[Store Reflection in Memory] G --> H[Backpropagate Value] H --> B E -->|Yes| I[Return Solution Trajectory] ===== Motivation ===== Existing approaches treat reasoning (CoT, ToT) and acting (ReAct) as separate capabilities. Tree-of-Thoughts explores reasoning paths but lacks grounding in environment feedback. ReAct interacts with environments but follows a single trajectory without systematic exploration. LATS bridges this gap by embedding ReAct-style action trajectories within an MCTS search tree, combining the exploration benefits of tree search with the grounding of environment interaction. ===== MCTS Phases Adapted for LLMs ===== LATS adapts the four classical MCTS phases for language agents. Each state $s = [x, a_{1:i}, o_{1:i}]$ represents the input $x$, actions taken $a_{1:i}$, and observations received $o_{1:i}$. === Selection === Starting from the root $s_0$, traverse the tree to a leaf node using the Upper Confidence Bound for Trees (UCT): $$\text{UCT}(s) = V(s) + c \sqrt{\frac{\ln N(\text{parent}(s))}{N(s)}}$$ where $V(s)$ is the estimated value, $N(s)$ is the visit count, and $c$ is the exploration constant balancing exploitation vs. exploration. === Expansion === At the selected leaf, sample $n$ new child nodes by prompting the LLM to generate actions (e.g., ReAct reasoning steps or tool calls), adding them to the tree. === Evaluation === Assign scalar values to new nodes via two complementary LLM-based heuristics: * **Progress score** — LLM self-evaluates how close the current state is to task completion * **Self-consistency score** — Agreement across similar trajectories in the tree For failed trajectories, the LLM generates **reflections** stored in memory as semantic feedback for future in-context learning. === Backpropagation === Update values and visit counts along the path from the evaluated node back to the root: $$V(s) \leftarrow \frac{1}{N(s)} \sum_{\text{visits}} v_i$$ class LATS: def __init__(self, llm, env, n_expansions=5, max_iterations=50, exploration_constant=1.0): self.llm = llm self.env = env self.n = n_expansions self.max_iter = max_iterations self.c = exploration_constant self.reflections = [] # Memory for failed trajectory reflections def search(self, task): root = Node(state=self.env.reset(task)) for _ in range(self.max_iter): # Selection: traverse tree using UCT node = self._select(root) # Expansion: generate n child actions via LLM children = self._expand(node) # Evaluation: score children with LLM heuristics for child in children: obs = self.env.step(child.action) child.observation = obs if self.env.is_terminal(child): if self.env.is_success(child): return child.trajectory() # Generate reflection on failure reflection = self.llm.reflect(child.trajectory()) self.reflections.append(reflection) child.value = self._evaluate(child) # Backpropagation: update values up the tree for child in children: self._backpropagate(child) return self._best_trajectory(root) def _select(self, node): while node.children: node = max(node.children, key=lambda c: self._uct(c)) return node def _uct(self, node): if node.visits == 0: return float('inf') exploit = node.value / node.visits explore = self.c * (math.log(node.parent.visits) / node.visits) ** 0.5 return exploit + explore def _evaluate(self, node): prompt = ( f"Trajectory so far:\n{node.trajectory()}\n" f"Past reflections:\n{self.reflections[-3:]}\n" f"Rate progress toward task completion (0-1):" ) return float(self.llm.generate(prompt)) ===== Key Architectural Innovation ===== The LLM plays three unified roles: ^ Role ^ Function ^ How ^ | **Policy** | Generate candidate actions | In-context prompting conditioned on trajectory | | **Value Function** | Evaluate state quality | Self-assessment + cross-trajectory consistency | | **Reflection** | Learn from failures | Generate verbal "gradients" stored in memory | This eliminates the need for separate trained components, enabling gradient-free learning from experience through in-context reflection. ===== Key Results ===== LATS achieves state-of-the-art across diverse task types (GPT-4, no fine-tuning): ^ Benchmark ^ Task Type ^ LATS ^ Previous SOTA ^ Improvement ^ | HumanEval | Programming | 92.7% pass@1 | 86.5% (ToT) | +6.2% | | WebShop | Web navigation | 75.9 avg score | Comparable to fine-tuned | Gradient-free | | HotPotQA | Multi-hop QA | Best EM | RAP, ToT | Fewer nodes | Efficiency gains are substantial: LATS uses 3.55x fewer nodes than RAP and 12.12x fewer nodes than ToT on HotPotQA. ===== Comparison with Related Methods ===== ^ Method ^ Search ^ Environment Feedback ^ Reflection ^ Acting ^ | ToT | BFS/DFS | No | No | No | | RAP | MCTS-like | No | No | No | | ReAct | None (single path) | Yes | No | Yes | | Reflexion | None (retry) | Yes | Yes | Yes | | LATS | MCTS | Yes | Yes | Yes | ===== References ===== * [[https://arxiv.org/abs/2310.04406|Zhou et al. "Language Agent Tree Search Unifies Reasoning Acting and Planning in Language Models" (arXiv:2310.04406)]] * [[https://arxiv.org/abs/2305.10601|Yao et al. "Tree of Thoughts" (arXiv:2305.10601)]] * [[https://arxiv.org/abs/2210.03629|Yao et al. "ReAct: Synergizing Reasoning and Acting" (arXiv:2210.03629)]] ===== See Also ===== * [[tree_of_thoughts]] * [[graph_of_thoughts]] * [[chain_of_thought]] * [[self_consistency]]