====== Graph of Thoughts ====== **Graph of Thoughts (GoT)** is a reasoning framework introduced by Besta et al. (2024) that models LLM reasoning as an arbitrary directed graph $G = (V, E)$, where nodes $V$ represent individual "thoughts" (intermediate reasoning steps) and edges $E$ denote dependencies between them. By generalizing from linear chains to arbitrary graph topologies, GoT enables cycles, merges, and refinement operations that capture complex reasoning patterns like dynamic programming subproblem combinations. ===== Motivation ===== Prior prompting frameworks impose rigid structural constraints on reasoning. Chain-of-Thought (CoT) restricts reasoning to a linear sequence, while Tree-of-Thoughts (ToT) allows branching but not merging of independent paths. Many real-world problems require combining partial solutions, iterative refinement, and feedback loops — operations that demand a graph topology. ===== Framework Architecture ===== GoT defines reasoning as a programmatic construction of a directed graph through four core operations: * **Generation** — From a parent thought, prompt the LLM to produce $k$ child thoughts, enabling branching exploration * **Aggregation** — Merge multiple parent thoughts into a single child, synthesizing partial solutions (in-degree > 1) * **Refinement** — Iteratively improve a thought by feeding it back through the LLM with downstream context * **Scoring** — Evaluate each thought using LLM-based heuristics, such as error counting or voting across states These operations compose into arbitrary graph topologies. A controller orchestrates the graph construction, an LLM executor processes individual thoughts, and a scoring module ranks candidates. ===== Graph Operations ===== The key graph operations and their structural implications: $$G_{\text{GoT}} \supseteq G_{\text{ToT}} \supseteq G_{\text{CoT}}$$ Where CoT is a path graph (each node has in-degree and out-degree $\leq 1$), ToT is a tree (in-degree $\leq 1$, out-degree $\geq 1$), and GoT permits arbitrary connectivity including cycles. from got_framework import operations, OperationsGraph # Build a GoT reasoning graph for a sorting task graph = OperationsGraph() # Phase 1: Generate 10 candidate partial solutions from input graph.append(operations.Generate(num_parents=1, num_children=10)) # Phase 2: Score each candidate using error-count heuristic graph.append(operations.Score( num_inputs=1, combined=False, scoring_fn=lambda thought: -count_errors(thought) )) # Phase 3: Keep the best 3 candidates graph.append(operations.KeepBestN(n=3)) # Phase 4: Aggregate top candidates into a merged solution graph.append(operations.Aggregate(num_parents=3, num_children=1)) # Phase 5: Refine the merged solution graph.append(operations.Refine(num_iterations=2)) # Phase 6: Evaluate against ground truth graph.append(operations.GroundTruth(test_cases=test_data)) result = graph.execute(llm=model, input_thought=problem) ===== Comparison with CoT and ToT ===== ^ Method ^ Structure ^ Branching ^ Merging ^ Cycles ^ Expressiveness ^ | CoT | Linear chain | No | No | No | Sequential reasoning only | | ToT | Tree | Yes | No | No | Parallel exploration, backtracking | | GoT | Arbitrary graph | Yes | Yes | Yes | Full graph: aggregation, refinement, feedback | GoT subsumes both CoT (as a path subgraph) and ToT (as a tree subgraph) while enabling fundamentally new reasoning patterns through aggregation and cyclic refinement. ===== Key Results ===== * **Sorting tasks**: GoT achieved 23x improvement over CoT baselines via error-rate reduction through iterative aggregation and refinement * **Creative writing**: Graph-constrained reasoning produced higher-quality outputs by merging diverse perspectives * **Token efficiency**: Aggregation and ranking reduce total tokens compared to exhaustive ToT search * **Game of 24**: GoT's flexible topologies yielded higher accuracy by enabling subproblem decomposition and combination ===== Mathematical Formulation ===== Each thought $t_i \in V$ is scored by a value function: $$v(t_i) = \text{LLM}_{\text{eval}}(t_i, \text{context}(t_i))$$ Selection uses a ranking function over candidates: $$t^* = \arg\max_{t_i \in \text{candidates}} v(t_i)$$ Aggregation combines multiple thoughts: $$t_{\text{agg}} = \text{LLM}_{\text{gen}}(t_{p_1}, t_{p_2}, \ldots, t_{p_k})$$ where $t_{p_1}, \ldots, t_{p_k}$ are parent thoughts being merged. ===== References ===== * [[https://arxiv.org/abs/2308.09687|Besta et al. "Graph of Thoughts: Solving Elaborate Problems with Large Language Models" (arXiv:2308.09687)]] * [[https://arxiv.org/abs/2201.11903|Wei et al. "Chain-of-Thought Prompting Elicits Reasoning in Large Language Models" (arXiv:2201.11903)]] * [[https://arxiv.org/abs/2305.10601|Yao et al. "Tree of Thoughts: Deliberate Problem Solving with Large Language Models" (arXiv:2305.10601)]] ===== See Also ===== * [[tree_of_thoughts]] * [[chain_of_thought]] * [[language_agent_tree_search]] * [[buffer_of_thoughts]]