====== Chain of Thought Agents ====== Chain of Thought (CoT) agents are AI systems that explicitly decompose complex problems into intermediate reasoning steps before arriving at a final answer or action. By verbalizing their thought process, these agents achieve significantly improved performance on tasks requiring multi-step logic, arithmetic, and commonsense reasoning(([[https://arxiv.org/abs/2201.11903|Wei et al. "Chain-of-Thought Prompting Elicits Reasoning in Large Language Models" arXiv:2201.11903, 2022]])). CoT prompting has become a foundational technique in modern agent design, enabling more transparent and reliable decision-making across [[autonomous_agents|autonomous agent]] systems. ===== Origins and Evolution ===== CoT prompting was introduced by [[https://arxiv.org/abs/2201.11903|Wei et al., 2022]] at [[google|Google]], demonstrating that prompting LLMs with "Let's think step-by-step" or providing few-shot examples with explicit reasoning chains dramatically improved performance on math and logic benchmarks. The technique evolved rapidly: * **Zero-Shot CoT (2022)**: [[https://arxiv.org/abs/2210.11610|Kojima et al., 2022]] showed that simply appending "Let's think step by step" to prompts elicits reasoning without examples(([[https://arxiv.org/abs/2210.11610|Kojima et al. "Large Language Models are Zero-Shot Reasoners" arXiv:2210.11610, 2022]])) * **Self-Consistency (2023)**: [[https://arxiv.org/abs/2203.11171|Wang et al., 2023]] proposed generating multiple reasoning paths and selecting the most consistent answer via majority voting, reducing hallucinations(([[https://arxiv.org/abs/2203.11171|Wang et al. "Self-Consistency Improves Chain of Thought Reasoning in Language Models" arXiv:2203.11171, 2023]])) * **Tree-of-Thought (2023)**: [[https://arxiv.org/abs/2305.10601|Yao et al., 2023]] extended CoT to explore branching reasoning paths like a search tree, with self-evaluation at each node for complex decision-making(([[https://arxiv.org/abs/2305.10601|Yao et al. "Tree of Thoughts: Deliberate Problem Solving with Large Language Models" arXiv:2305.10601, 2023]])) * **Graph-of-Thought (2023)**: [[https://arxiv.org/abs/2304.11195|Besta et al., 2023]] organized reasoning into directed acyclic graphs, enabling interconnected reasoning paths for multifaceted problems beyond linear chains(([[https://arxiv.org/abs/2304.11195|Besta et al. "Graph of Thoughts: Solving Elaborate Problems with Large Language Models" arXiv:2304.11195, 2023]])) By 2025, CoT has evolved from simple prompting into a core architectural component of reasoning models. ===== Reasoning Models ===== The most significant evolution of CoT is its internalization within dedicated reasoning models: * **[[openai|OpenAI]] o1/o3**: Use internal long [[chain_of_thought|chain-of-thought reasoning]], spending additional compute on "thinking" before responding. These models excel at mathematics, coding, and scientific reasoning by generating hundreds of intermediate reasoning steps internally. * **DeepSeek R1**: Open-source reasoning model supporting extended chains for sustained logical reasoning, trained using [[rlhf|GRPO]] (Group Relative Policy Optimization) to develop reasoning capabilities(([[https://arxiv.org/abs/2501.12948|DeepSeek-AI et al. "DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning" arXiv:2501.12948, 2025]])). * **Claude's Extended Thinking**: [[anthropic|Anthropic]]'s approach enables prolonged step-by-step deliberation via an API parameter, with separate pricing for reasoning tokens. Provides transparent visibility into the model's reasoning process. Claude uses a dedicated thinking [[block|block]] to process logic before generating a final response, making the reasoning chain inspectable for debugging and trust, and fundamentally changing the standard LLM contract from fire-and-forget to a transparent process where the model shows its work(([[https://cobusgreyling.substack.com/p/building-with-claude-extended-thinking|Cobus Greyling "Extended Thinking" LLMs (2026]])) These models demonstrate that CoT is not merely a prompting technique but a fundamental capability that can be trained into models through [[reinforcement_learning|reinforcement learning]]. ===== CoT in Agent Systems ===== Within [[autonomous_agents|autonomous agent]] architectures, CoT serves multiple roles: * **Planning**: Agents use CoT to decompose objectives into subtasks, forming the reasoning backbone of [[plan_and_execute_agents|plan-and-execute]] patterns * **Tool Selection**: [[react_agents|ReAct agents]] use CoT-style reasoning to determine which [[tool_using_agents|tools]] to invoke and in what order * **Self-Verification**: Agents apply CoT to check their own outputs for consistency and correctness before committing to actions * **Error Recovery**: When actions fail, CoT enables agents to reason about what went wrong and generate alternative approaches The combination of CoT with tool use, as in the [[react_agents|ReAct pattern]], produces agents that are both more capable and more interpretable than either pure reasoning or pure action approaches. ===== Code Example: CoT Agent with Step-by-Step Reasoning ===== from [[openai|openai]] import [[openai|OpenAI]] client = [[openai|OpenAI]]() COT_SYSTEM_PROMPT = """You are a reasoning agent. For every question: 1. Break the problem into explicit reasoning steps 2. Show your work for each step inside tags 3. Verify your reasoning in a tag 4. Give your final answer in an tag Example format: First, I identify that... Next, I calculate... Checking: ... The answer is...""" def cot_agent(question: str) -> dict: """Run a chain-of-thought agent that shows explicit reasoning steps.""" response = client.chat.completions.create( model="gpt-4o", messages=[ {"role": "system", "content": COT_SYSTEM_PROMPT}, {"role": "user", "content": question}, ], temperature=0.2, ) raw = response.choices[0].message.content # Parse structured reasoning output import re steps = re.findall(r"(.*?)", raw, re.DOTALL) verification = re.findall(r"(.*?)", raw, re.DOTALL) answer = re.findall(r"(.*?)", raw, re.DOTALL) return { "steps": [s.strip() for s in steps], "verification": verification[0].strip() if verification else None, "answer": answer[0].strip() if answer else raw, "raw": raw, } result = cot_agent( "A store has 3 types of boxes. Small holds 4 items, medium holds 9, large holds 15. " "I need to pack exactly 58 items using the fewest boxes. What combination should I use?" ) print("Reasoning steps:") for i, step in enumerate(result["steps"], 1): print(f" {i}. {step}") if result["verification"]: print(f"\nVerification: {result['verification']}") print(f"\nFinal answer: {result['answer']}") ===== Multimodal and Structured CoT ===== Recent advances extend CoT beyond text: * **Multimodal CoT**: Reasoning over images, audio, and video alongside text, grounding chain-of-thought in diverse data modalities * **Structured CoT for Code Generation**: Research published in ACM TOSEM (2025) proposes structured CoT variants specifically optimized for code reasoning tasks * **Action Chain-of-Thought**: [[agibot|AGIBOT]]'s GO-2 model employs action-oriented CoT where the AI generates a sequence of executable action intents before committing to raw control commands, translating high-level reasoning into reliable physical movements for robotic control through a semantic path that guides fast-following control modules(([[https://www.rohan-paul.com/p/[[cursor|cursor]]-just-turned-its-agent-workflow|Rohan's Bytes "Action Chain-of-Thought" (2026]])) * **Multi-Agent CoT**: Multiple LLMs collaborating through shared reasoning chains, with each agent contributing domain-specific reasoning steps ===== See Also ===== * [[chain_of_thought|Chain-of-Thought Reasoning]] * [[cognitive_architectures_language_agents|Cognitive Architectures for Language Agents (CoALA)]] * [[agent_loop|Agent Loop]] * [[hypothesis_testing_iteration|Hypothesis Testing and Iteration]] * [[advanced_reasoning_planning|Advanced Reasoning and Planning]] ===== References =====