====== Language Agent Tree Search ======

**Language Agent Tree Search (LATS)** is a general framework introduced by Zhou et al. (2023) that unifies reasoning, acting, and planning by combining **Monte Carlo Tree Search (MCTS)** with LLM capabilities. The LLM serves simultaneously as the policy (action generator), value function (state evaluator), and reflection mechanism, enabling systematic exploration of decision trees with environment feedback.

<mermaid>
graph TD
    A[Task Input] --> B[Select Node via UCT]
    B --> C[LLM Generates Actions]
    C --> D[Environment Feedback]
    D --> E{Success?}
    E -->|No| F[LLM Reflects on Failure]
    F --> G[Store Reflection in Memory]
    G --> H[Backpropagate Value]
    H --> B
    E -->|Yes| I[Return Solution Trajectory]
</mermaid>

===== Motivation =====

Existing approaches treat reasoning (CoT, ToT) and acting (ReAct) as separate capabilities. Tree-of-Thoughts explores reasoning paths but lacks grounding in environment feedback. ReAct interacts with environments but follows a single trajectory without systematic exploration. LATS bridges this gap by embedding ReAct-style action trajectories within an MCTS search tree, combining the exploration benefits of tree search with the grounding of environment interaction.

===== MCTS Phases Adapted for LLMs =====

LATS adapts the four classical MCTS phases for language agents. Each state $s = [x, a_{1:i}, o_{1:i}]$ represents the input $x$, actions taken $a_{1:i}$, and observations received $o_{1:i}$.

=== Selection ===

Starting from the root $s_0$, traverse the tree to a leaf node using the Upper Confidence Bound for Trees (UCT):

$$\text{UCT}(s) = V(s) + c \sqrt{\frac{\ln N(\text{parent}(s))}{N(s)}}$$

where $V(s)$ is the estimated value, $N(s)$ is the visit count, and $c$ is the exploration constant balancing exploitation vs. exploration.

=== Expansion ===

At the selected leaf, sample $n$ new child nodes by prompting the LLM to generate actions (e.g., ReAct reasoning steps or tool calls), adding them to the tree.

=== Evaluation ===

Assign scalar values to new nodes via two complementary LLM-based heuristics:

  * **Progress score** — LLM self-evaluates how close the current state is to task completion
  * **Self-consistency score** — Agreement across similar trajectories in the tree

For failed trajectories, the LLM generates **reflections** stored in memory as semantic feedback for future in-context learning.

=== Backpropagation ===

Update values and visit counts along the path from the evaluated node back to the root:

$$V(s) \leftarrow \frac{1}{N(s)} \sum_{\text{visits}} v_i$$

<code python>
class LATS:
    def __init__(self, llm, env, n_expansions=5, max_iterations=50,
                 exploration_constant=1.0):
        self.llm = llm
        self.env = env
        self.n = n_expansions
        self.max_iter = max_iterations
        self.c = exploration_constant
        self.reflections = []  # Memory for failed trajectory reflections

    def search(self, task):
        root = Node(state=self.env.reset(task))

        for _ in range(self.max_iter):
            # Selection: traverse tree using UCT
            node = self._select(root)

            # Expansion: generate n child actions via LLM
            children = self._expand(node)

            # Evaluation: score children with LLM heuristics
            for child in children:
                obs = self.env.step(child.action)
                child.observation = obs

                if self.env.is_terminal(child):
                    if self.env.is_success(child):
                        return child.trajectory()
                    # Generate reflection on failure
                    reflection = self.llm.reflect(child.trajectory())
                    self.reflections.append(reflection)

                child.value = self._evaluate(child)

            # Backpropagation: update values up the tree
            for child in children:
                self._backpropagate(child)

        return self._best_trajectory(root)

    def _select(self, node):
        while node.children:
            node = max(node.children, key=lambda c: self._uct(c))
        return node

    def _uct(self, node):
        if node.visits == 0:
            return float('inf')
        exploit = node.value / node.visits
        explore = self.c * (math.log(node.parent.visits) / node.visits) ** 0.5
        return exploit + explore

    def _evaluate(self, node):
        prompt = (
            f"Trajectory so far:\n{node.trajectory()}\n"
            f"Past reflections:\n{self.reflections[-3:]}\n"
            f"Rate progress toward task completion (0-1):"
        )
        return float(self.llm.generate(prompt))
</code>

===== Key Architectural Innovation =====

The LLM plays three unified roles:

^ Role ^ Function ^ How ^
| **Policy** | Generate candidate actions | In-context prompting conditioned on trajectory |
| **Value Function** | Evaluate state quality | Self-assessment + cross-trajectory consistency |
| **Reflection** | Learn from failures | Generate verbal "gradients" stored in memory |

This eliminates the need for separate trained components, enabling gradient-free learning from experience through in-context reflection.

===== Key Results =====

LATS achieves state-of-the-art across diverse task types (GPT-4, no fine-tuning):

^ Benchmark ^ Task Type ^ LATS ^ Previous SOTA ^ Improvement ^
| HumanEval | Programming | 92.7% pass@1 | 86.5% (ToT) | +6.2% |
| WebShop | Web navigation | 75.9 avg score | Comparable to fine-tuned | Gradient-free |
| HotPotQA | Multi-hop QA | Best EM | RAP, ToT | Fewer nodes |

Efficiency gains are substantial: LATS uses 3.55x fewer nodes than RAP and 12.12x fewer nodes than ToT on HotPotQA.

===== Comparison with Related Methods =====

^ Method ^ Search ^ Environment Feedback ^ Reflection ^ Acting ^
| ToT | BFS/DFS | No | No | No |
| RAP | MCTS-like | No | No | No |
| ReAct | None (single path) | Yes | No | Yes |
| Reflexion | None (retry) | Yes | Yes | Yes |
| LATS | MCTS | Yes | Yes | Yes |

===== References =====

  * [[https://arxiv.org/abs/2310.04406|Zhou et al. "Language Agent Tree Search Unifies Reasoning Acting and Planning in Language Models" (arXiv:2310.04406)]]
  * [[https://arxiv.org/abs/2305.10601|Yao et al. "Tree of Thoughts" (arXiv:2305.10601)]]
  * [[https://arxiv.org/abs/2210.03629|Yao et al. "ReAct: Synergizing Reasoning and Acting" (arXiv:2210.03629)]]

===== See Also =====

  * [[tree_of_thoughts]]
  * [[graph_of_thoughts]]
  * [[chain_of_thought]]
  * [[self_consistency]]