Buffer of Thoughts

Buffer of Thoughts (BoT) is a thought-augmented reasoning framework introduced by Yang et al. (NeurIPS 2024 Spotlight) that maintains a meta-buffer — a library of reusable high-level thought templates distilled from past problem-solving — which are retrieved and instantiated for new tasks. BoT achieves state-of-the-art reasoning accuracy while requiring only 12% of the computational cost of multi-query methods like Tree of Thoughts.

graph TD A[New Problem] --> B[Problem Distiller] B --> C[Extract Key Info] C --> D[Retrieve Template from Meta-Buffer] D --> E[Instantiate Template] E --> F[Reason with LLM] F --> G[Solution] F --> H{Novel Pattern?} H -->|Yes| I[Store Improved Template] I --> J[(Meta-Buffer)] D -.-> J

Motivation

Existing prompting methods either construct reasoning from scratch for each problem (expensive and error-prone) or rely on fixed exemplars that lack generalization. Humans, by contrast, accumulate problem-solving patterns over time and retrieve relevant strategies when facing new challenges. BoT operationalizes this cognitive process by building a growing library of abstract reasoning templates.

Architecture

BoT consists of four interconnected components:

Problem Distiller — Extracts critical task-specific information: essential parameters/variables and task objectives with constraints. Reorganizes into a clear format for downstream processing.
Meta-Buffer — A persistent library of universal thought-templates that capture abstract reasoning structures across task types. Each template encodes a high-level solution strategy rather than a specific answer.
Thought Retrieval & Instantiation — For each new problem, retrieves the most relevant template from the meta-buffer and adaptively instantiates it with problem-specific reasoning structures.
Buffer Manager — Dynamically updates the meta-buffer as new tasks are solved, expanding coverage and refining existing templates.

Thought Template Lifecycle

The lifecycle of a thought template follows a distill-store-retrieve-instantiate-update loop:

$$\text{Problem} \xrightarrow{\text{distill}} \text{Key Info} \xrightarrow{\text{retrieve}} \text{Template} \xrightarrow{\text{instantiate}} \text{Solution}$$

After successful problem-solving, the buffer manager evaluates whether the solution introduces a novel reasoning pattern. If so, it distills a new template and adds it to the meta-buffer.

class BufferOfThoughts:
    def __init__(self, llm, meta_buffer=None):
        self.llm = llm
        self.meta_buffer = meta_buffer or ThoughtTemplateLibrary()
        self.distiller = ProblemDistiller(llm)
        self.buffer_manager = BufferManager(llm)
 
    def solve(self, problem):
        # Step 1: Distill key information from the problem
        distilled = self.distiller.extract(problem)
        # distilled contains: variables, objectives, constraints
 
        # Step 2: Retrieve most relevant thought template
        template = self.meta_buffer.retrieve(
            query=distilled,
            similarity_fn=semantic_similarity
        )
 
        # Step 3: Instantiate template with problem-specific details
        reasoning = self.llm.generate(
            prompt=f"Apply this reasoning template:\n{template}\n"
                   f"To solve:\n{distilled}"
        )
 
        # Step 4: Extract answer from instantiated reasoning
        answer = self.llm.extract_answer(reasoning)
 
        # Step 5: Update meta-buffer if novel pattern detected
        self.buffer_manager.maybe_update(
            meta_buffer=self.meta_buffer,
            problem=problem,
            reasoning=reasoning
        )
 
        return answer

Comparison with Other Methods

Method	Queries per Problem	Template Reuse	Adaptiveness	Cost
Chain-of-Thought	1 (single path)	None	Low	Low
Self-Consistency	$k$ (sample + vote)	None	Low	Medium
Tree of Thoughts	Many (search tree)	None	Medium	High
Buffer of Thoughts	~1 (retrieve + instantiate)	Yes	High	Low

BoT combines the accuracy benefits of multi-query methods with the efficiency of single-query methods by amortizing reasoning effort across problems through template reuse.

Key Results

BoT was evaluated across 10 challenging reasoning-intensive tasks:

Game of 24: +11% over previous SOTA; +79.4% over GPT-4 baseline; +8.4% over ToT
Geometric Shapes: +20% over previous SOTA
Checkmate-in-One: +51% over previous SOTA
Computational cost: Only 12% of ToT's cost on average
Reasoning time: Comparable to single-query methods despite multi-query quality

The framework demonstrates strong generalization — templates learned from one problem domain effectively transfer to related domains.

Why It Works

The theoretical intuition mirrors human cognitive science: experts solve problems faster not by thinking harder, but by recognizing patterns and applying known strategies. BoT formalizes this as:

$$P(\text{correct} | \text{template}) > P(\text{correct} | \text{scratch})$$

The meta-buffer accumulates a growing repertoire of reasoning strategies, and retrieval-based instantiation ensures each problem benefits from the model's collective experience.

AI Agent Knowledge Base

Sidebar

Table of Contents

Buffer of Thoughts

Motivation

Architecture

Thought Template Lifecycle

Comparison with Other Methods

Key Results

Why It Works

References

See Also

AI Agent Knowledge Base

User Tools

Site Tools

Sidebar

Table of Contents

Buffer of Thoughts

Motivation

Architecture

Thought Template Lifecycle

Comparison with Other Methods

Key Results

Why It Works

References

See Also

Page Tools