====== Buffer of Thoughts ======
**Buffer of Thoughts (BoT)** is a thought-augmented reasoning framework introduced by Yang et al. (NeurIPS 2024 Spotlight) that maintains a **meta-buffer** — a library of reusable high-level thought templates distilled from past problem-solving — which are retrieved and instantiated for new tasks. BoT achieves state-of-the-art reasoning accuracy while requiring only 12% of the computational cost of multi-query methods like Tree of Thoughts.
graph TD
A[New Problem] --> B[Problem Distiller]
B --> C[Extract Key Info]
C --> D[Retrieve Template from Meta-Buffer]
D --> E[Instantiate Template]
E --> F[Reason with LLM]
F --> G[Solution]
F --> H{Novel Pattern?}
H -->|Yes| I[Store Improved Template]
I --> J[(Meta-Buffer)]
D -.-> J
===== Motivation =====
Existing prompting methods either construct reasoning from scratch for each problem (expensive and error-prone) or rely on fixed exemplars that lack generalization. Humans, by contrast, accumulate problem-solving patterns over time and retrieve relevant strategies when facing new challenges. BoT operationalizes this cognitive process by building a growing library of abstract reasoning templates.
===== Architecture =====
BoT consists of four interconnected components:
* **Problem Distiller** — Extracts critical task-specific information: essential parameters/variables and task objectives with constraints. Reorganizes into a clear format for downstream processing.
* **Meta-Buffer** — A persistent library of universal thought-templates that capture abstract reasoning structures across task types. Each template encodes a high-level solution strategy rather than a specific answer.
* **Thought Retrieval & Instantiation** — For each new problem, retrieves the most relevant template from the meta-buffer and adaptively instantiates it with problem-specific reasoning structures.
* **Buffer Manager** — Dynamically updates the meta-buffer as new tasks are solved, expanding coverage and refining existing templates.
===== Thought Template Lifecycle =====
The lifecycle of a thought template follows a distill-store-retrieve-instantiate-update loop:
$$\text{Problem} \xrightarrow{\text{distill}} \text{Key Info} \xrightarrow{\text{retrieve}} \text{Template} \xrightarrow{\text{instantiate}} \text{Solution}$$
After successful problem-solving, the buffer manager evaluates whether the solution introduces a novel reasoning pattern. If so, it distills a new template and adds it to the meta-buffer.
class BufferOfThoughts:
def __init__(self, llm, meta_buffer=None):
self.llm = llm
self.meta_buffer = meta_buffer or ThoughtTemplateLibrary()
self.distiller = ProblemDistiller(llm)
self.buffer_manager = BufferManager(llm)
def solve(self, problem):
# Step 1: Distill key information from the problem
distilled = self.distiller.extract(problem)
# distilled contains: variables, objectives, constraints
# Step 2: Retrieve most relevant thought template
template = self.meta_buffer.retrieve(
query=distilled,
similarity_fn=semantic_similarity
)
# Step 3: Instantiate template with problem-specific details
reasoning = self.llm.generate(
prompt=f"Apply this reasoning template:\n{template}\n"
f"To solve:\n{distilled}"
)
# Step 4: Extract answer from instantiated reasoning
answer = self.llm.extract_answer(reasoning)
# Step 5: Update meta-buffer if novel pattern detected
self.buffer_manager.maybe_update(
meta_buffer=self.meta_buffer,
problem=problem,
reasoning=reasoning
)
return answer
===== Comparison with Other Methods =====
^ Method ^ Queries per Problem ^ Template Reuse ^ Adaptiveness ^ Cost ^
| Chain-of-Thought | 1 (single path) | None | Low | Low |
| Self-Consistency | $k$ (sample + vote) | None | Low | Medium |
| Tree of Thoughts | Many (search tree) | None | Medium | High |
| Buffer of Thoughts | ~1 (retrieve + instantiate) | Yes | High | Low |
BoT combines the accuracy benefits of multi-query methods with the efficiency of single-query methods by amortizing reasoning effort across problems through template reuse.
===== Key Results =====
BoT was evaluated across 10 challenging reasoning-intensive tasks:
* **Game of 24**: +11% over previous SOTA; +79.4% over GPT-4 baseline; +8.4% over ToT
* **Geometric Shapes**: +20% over previous SOTA
* **Checkmate-in-One**: +51% over previous SOTA
* **Computational cost**: Only 12% of ToT's cost on average
* **Reasoning time**: Comparable to single-query methods despite multi-query quality
The framework demonstrates strong generalization — templates learned from one problem domain effectively transfer to related domains.
===== Why It Works =====
The theoretical intuition mirrors human cognitive science: experts solve problems faster not by thinking harder, but by recognizing patterns and applying known strategies. BoT formalizes this as:
$$P(\text{correct} | \text{template}) > P(\text{correct} | \text{scratch})$$
The meta-buffer accumulates a growing repertoire of reasoning strategies, and retrieval-based instantiation ensures each problem benefits from the model's collective experience.
===== References =====
* [[https://arxiv.org/abs/2406.04271|Yang et al. "Buffer of Thoughts: Thought-Augmented Reasoning with Large Language Models" (arXiv:2406.04271)]]
* [[https://neurips.cc/virtual/2024/poster/96264|NeurIPS 2024 Spotlight Presentation]]
* [[https://github.com/YangLing0818/buffer-of-thought-llm|Official Code Repository]]
===== See Also =====
* [[graph_of_thoughts]]
* [[tree_of_thoughts]]
* [[chain_of_thought]]
* [[self_consistency]]