Chain of Thought Agents

Chain of Thought (CoT) agents are AI systems that explicitly decompose complex problems into intermediate reasoning steps before arriving at a final answer or action. By verbalizing their thought process, these agents achieve significantly improved performance on tasks requiring multi-step logic, arithmetic, and commonsense reasoning¹⁾. CoT prompting has become a foundational technique in modern agent design, enabling more transparent and reliable decision-making across autonomous agent systems.

Origins and Evolution

CoT prompting was introduced by Wei et al., 2022 at Google, demonstrating that prompting LLMs with “Let's think step-by-step” or providing few-shot examples with explicit reasoning chains dramatically improved performance on math and logic benchmarks. The technique evolved rapidly:

Zero-Shot CoT (2022): Kojima et al., 2022 showed that simply appending “Let's think step by step” to prompts elicits reasoning without examples²⁾
Self-Consistency (2023): Wang et al., 2023 proposed generating multiple reasoning paths and selecting the most consistent answer via majority voting, reducing hallucinations³⁾
Tree-of-Thought (2023): Yao et al., 2023 extended CoT to explore branching reasoning paths like a search tree, with self-evaluation at each node for complex decision-making⁴⁾
Graph-of-Thought (2023): Besta et al., 2023 organized reasoning into directed acyclic graphs, enabling interconnected reasoning paths for multifaceted problems beyond linear chains⁵⁾

By 2025, CoT has evolved from simple prompting into a core architectural component of reasoning models.

Reasoning Models

The most significant evolution of CoT is its internalization within dedicated reasoning models:

OpenAI o1/o3: Use internal long chain-of-thought reasoning, spending additional compute on “thinking” before responding. These models excel at mathematics, coding, and scientific reasoning by generating hundreds of intermediate reasoning steps internally.
DeepSeek R1: Open-source reasoning model supporting extended chains for sustained logical reasoning, trained using GRPO (Group Relative Policy Optimization) to develop reasoning capabilities⁶⁾.
Claude's Extended Thinking: Anthropic's approach enables prolonged step-by-step deliberation via an API parameter, with separate pricing for reasoning tokens. Provides transparent visibility into the model's reasoning process. Claude uses a dedicated thinking block to process logic before generating a final response, making the reasoning chain inspectable for debugging and trust, and fundamentally changing the standard LLM contract from fire-and-forget to a transparent process where the model shows its work⁷⁾

These models demonstrate that CoT is not merely a prompting technique but a fundamental capability that can be trained into models through reinforcement learning.

CoT in Agent Systems

Within autonomous agent architectures, CoT serves multiple roles:

Planning: Agents use CoT to decompose objectives into subtasks, forming the reasoning backbone of plan-and-execute patterns
Tool Selection: ReAct agents use CoT-style reasoning to determine which tools to invoke and in what order
Self-Verification: Agents apply CoT to check their own outputs for consistency and correctness before committing to actions
Error Recovery: When actions fail, CoT enables agents to reason about what went wrong and generate alternative approaches

The combination of CoT with tool use, as in the ReAct pattern, produces agents that are both more capable and more interpretable than either pure reasoning or pure action approaches.

Code Example: CoT Agent with Step-by-Step Reasoning

from [[openai|openai]] import [[openai|OpenAI]]
 
client = [[openai|OpenAI]]()
 
COT_SYSTEM_PROMPT = """You are a reasoning agent. For every question:
1. Break the problem into explicit reasoning steps
2. Show your work for each step inside <step> tags
3. Verify your reasoning in a <verify> tag
4. Give your final answer in an <answer> tag
 
Example format:
<step>First, I identify that...</step>
<step>Next, I calculate...</step>
<verify>Checking: ...</verify>
<answer>The answer is...</answer>"""
 
 
def cot_agent(question: str) -> dict:
    """Run a chain-of-thought agent that shows explicit reasoning steps."""
    response = client.chat.completions.create(
        model="gpt-4o",
        messages=[
            {"role": "system", "content": COT_SYSTEM_PROMPT},
            {"role": "user", "content": question},
        ],
        temperature=0.2,
    )
    raw = response.choices[0].message.content
 
    # Parse structured reasoning output
    import re
    steps = re.findall(r"<step>(.*?)</step>", raw, re.DOTALL)
    verification = re.findall(r"<verify>(.*?)</verify>", raw, re.DOTALL)
    answer = re.findall(r"<answer>(.*?)</answer>", raw, re.DOTALL)
 
    return {
        "steps": [s.strip() for s in steps],
        "verification": verification[0].strip() if verification else None,
        "answer": answer[0].strip() if answer else raw,
        "raw": raw,
    }
 
 
result = cot_agent(
    "A store has 3 types of boxes. Small holds 4 items, medium holds 9, large holds 15. "
    "I need to pack exactly 58 items using the fewest boxes. What combination should I use?"
)
 
print("Reasoning steps:")
for i, step in enumerate(result["steps"], 1):
    print(f"  {i}. {step}")
if result["verification"]:
    print(f"\nVerification: {result['verification']}")
print(f"\nFinal answer: {result['answer']}")

Multimodal and Structured CoT

Recent advances extend CoT beyond text:

Multimodal CoT: Reasoning over images, audio, and video alongside text, grounding chain-of-thought in diverse data modalities
Structured CoT for Code Generation: Research published in ACM TOSEM (2025) proposes structured CoT variants specifically optimized for code reasoning tasks
Action Chain-of-Thought: AGIBOT's GO-2 model employs action-oriented CoT where the AI generates a sequence of executable action intents before committing to raw control commands, translating high-level reasoning into reliable physical movements for robotic control through a semantic path that guides fast-following control modules⁸⁾
Multi-Agent CoT: Multiple LLMs collaborating through shared reasoning chains, with each agent contributing domain-specific reasoning steps

References

¹⁾

Wei et al. "Chain-of-Thought Prompting Elicits Reasoning in Large Language Models" arXiv:2201.11903, 2022

²⁾

Kojima et al. "Large Language Models are Zero-Shot Reasoners" arXiv:2210.11610, 2022

³⁾

Wang et al. "Self-Consistency Improves Chain of Thought Reasoning in Language Models" arXiv:2203.11171, 2023

⁴⁾

Yao et al. "Tree of Thoughts: Deliberate Problem Solving with Large Language Models" arXiv:2305.10601, 2023

⁵⁾

Besta et al. "Graph of Thoughts: Solving Elaborate Problems with Large Language Models" arXiv:2304.11195, 2023

⁶⁾

DeepSeek-AI et al. "DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning" arXiv:2501.12948, 2025

⁷⁾

Cobus Greyling "Extended Thinking" LLMs (2026

⁸⁾

cursor-just-turned-its-agent-workflow|Rohan's Bytes “Action Chain-of-Thought” (2026]]

AI Agent Knowledge Base

Sidebar

Table of Contents

Chain of Thought Agents

Origins and Evolution

Reasoning Models

CoT in Agent Systems

Code Example: CoT Agent with Step-by-Step Reasoning

Multimodal and Structured CoT

See Also

References

AI Agent Knowledge Base

User Tools

Site Tools

Sidebar

Table of Contents

Chain of Thought Agents

Origins and Evolution

Reasoning Models

CoT in Agent Systems

Code Example: CoT Agent with Step-by-Step Reasoning

Multimodal and Structured CoT

See Also

References

Page Tools