Buffer of Thoughts (BoT) is a thought-augmented reasoning framework introduced by Yang et al. (NeurIPS 2024 Spotlight) that maintains a meta-buffer — a library of reusable high-level thought templates distilled from past problem-solving — which are retrieved and instantiated for new tasks. BoT achieves state-of-the-art reasoning accuracy while requiring only 12% of the computational cost of multi-query methods like Tree of Thoughts.
Existing prompting methods either construct reasoning from scratch for each problem (expensive and error-prone) or rely on fixed exemplars that lack generalization. Humans, by contrast, accumulate problem-solving patterns over time and retrieve relevant strategies when facing new challenges. BoT operationalizes this cognitive process by building a growing library of abstract reasoning templates.
BoT consists of four interconnected components:
The lifecycle of a thought template follows a distill-store-retrieve-instantiate-update loop:
$$\text{Problem} \xrightarrow{\text{distill}} \text{Key Info} \xrightarrow{\text{retrieve}} \text{Template} \xrightarrow{\text{instantiate}} \text{Solution}$$
After successful problem-solving, the buffer manager evaluates whether the solution introduces a novel reasoning pattern. If so, it distills a new template and adds it to the meta-buffer.
class BufferOfThoughts: def __init__(self, llm, meta_buffer=None): self.llm = llm self.meta_buffer = meta_buffer or ThoughtTemplateLibrary() self.distiller = ProblemDistiller(llm) self.buffer_manager = BufferManager(llm) def solve(self, problem): # Step 1: Distill key information from the problem distilled = self.distiller.extract(problem) # distilled contains: variables, objectives, constraints # Step 2: Retrieve most relevant thought template template = self.meta_buffer.retrieve( query=distilled, similarity_fn=semantic_similarity ) # Step 3: Instantiate template with problem-specific details reasoning = self.llm.generate( prompt=f"Apply this reasoning template:\n{template}\n" f"To solve:\n{distilled}" ) # Step 4: Extract answer from instantiated reasoning answer = self.llm.extract_answer(reasoning) # Step 5: Update meta-buffer if novel pattern detected self.buffer_manager.maybe_update( meta_buffer=self.meta_buffer, problem=problem, reasoning=reasoning ) return answer
| Method | Queries per Problem | Template Reuse | Adaptiveness | Cost |
|---|---|---|---|---|
| Chain-of-Thought | 1 (single path) | None | Low | Low |
| Self-Consistency | $k$ (sample + vote) | None | Low | Medium |
| Tree of Thoughts | Many (search tree) | None | Medium | High |
| Buffer of Thoughts | ~1 (retrieve + instantiate) | Yes | High | Low |
BoT combines the accuracy benefits of multi-query methods with the efficiency of single-query methods by amortizing reasoning effort across problems through template reuse.
BoT was evaluated across 10 challenging reasoning-intensive tasks:
The framework demonstrates strong generalization — templates learned from one problem domain effectively transfer to related domains.
The theoretical intuition mirrors human cognitive science: experts solve problems faster not by thinking harder, but by recognizing patterns and applying known strategies. BoT formalizes this as:
$$P(\text{correct} | \text{template}) > P(\text{correct} | \text{scratch})$$
The meta-buffer accumulates a growing repertoire of reasoning strategies, and retrieval-based instantiation ensures each problem benefits from the model's collective experience.