ReAct: Reasoning and Acting

ReAct is a prompting framework that synergizes reasoning (chain-of-thought style) and acting (tool use and environment interaction) within large language models.¹⁾ Proposed by Yao et al., 2022 in “ReAct: Synergizing Reasoning and Acting in Language Models,” ReAct interleaves the generation of reasoning traces with task-specific actions, allowing the model to dynamically plan, retrieve information, and adjust its approach based on observations from the environment. This tight coupling of thought and action has proven effective for tasks such as question answering, fact verification, and interactive decision-making, and has become a foundational pattern for LLM agent architectures.

graph LR Q[Question] --> T1[Thought 1] T1 --> A1[Action 1] A1 --> O1[Observation 1] O1 --> T2[Thought 2] T2 --> A2[Action 2] A2 --> O2[Observation 2] O2 --> T3[Thought 3] T3 --> Ans[Final Answer] style T1 fill:#e1f5fe style T2 fill:#e1f5fe style T3 fill:#e1f5fe style A1 fill:#fff3e0 style A2 fill:#fff3e0 style O1 fill:#e8f5e9 style O2 fill:#e8f5e9

The Reasoning-Action Loop

ReAct operates through an iterative Thought-Action-Observation cycle:²⁾

Thought (t_t): The model generates a natural language reasoning trace that analyzes the current situation, tracks progress toward the goal, plans the next step, or handles exceptions. For example: “The search returned the founding year but not the founder's name. I should search for the founder specifically.”
Action (a_t): Based on the reasoning, the model selects and executes a concrete action from the available action space, such as search[query], lookup[term], or finish[answer].
Observation (o_t): The environment returns the result of the action (e.g., a search snippet, a page lookup result), which becomes part of the context for the next iteration.

This loop repeats until the model decides to terminate (e.g., by calling finish[answer]). The full trajectory of thoughts, actions, and observations is maintained in the prompt context, giving the model a working memory of its reasoning and actions so far.

The key innovation is that thoughts and actions are interleaved, not separated. The reasoning traces ground the model's decisions in explicit logic, while the actions ground the reasoning in real-world information, reducing hallucination.

Comparison with Chain-of-Thought and Action-Only Approaches

Approach	Mechanism	Strengths	Weaknesses
Chain-of-Thought	Pure internal reasoning without external actions	Strong on knowledge tasks; no environment needed³⁾	No external grounding; hallucinations on factual queries
Action-Only	Direct tool calls without explicit reasoning	Low overhead; fast execution	Opaque decisions; poor error recovery
ReAct	Interleaved reasoning + actions	Interpretable; robust to failures; self-correcting	Higher token cost; potential reasoning loops

On HotpotQA (multi-hop question answering), ReAct with PaLM-540B achieved 41% exact match, compared to 37% for CoT alone.⁴⁾ The reasoning traces allowed the model to decompose multi-hop questions and correct course when initial searches returned irrelevant results.

On Fever (fact verification), ReAct similarly outperformed both reasoning-only and action-only baselines by using search actions to verify claims and reasoning to synthesize evidence.

On ALFWorld (text-based household tasks), ReAct completed 30-40% of episodes, far exceeding imitation learning baselines (~10%), by reasoning through environment feedback to plan multi-step actions like finding and cleaning objects.

Tool Integration and Environment Design

ReAct's effectiveness depends on the design of the action space. In the original paper, actions were simple text commands (search, lookup, finish) interfacing with Wikipedia. In practice, the action space can include:

Search APIs: Web search, document retrieval, database queries
Code execution: Running Python code, SQL queries, shell commands
External tools: Calculators, calendars, translation services
Environment actions: Navigation, manipulation in simulated environments

The action space should be well-defined with clear semantics, as ambiguous action definitions lead to the model making incorrect tool calls. Each tool should return structured observations that the model can reason about in subsequent thoughts.

Real-World Applications and Agent Systems

ReAct has become the dominant pattern for LLM agent implementations:

LangChain⁵⁾ provides create_react_agent which parses LLM outputs for thought/action pairs and manages the execution loop. It supports custom tools and integrates with the broader LangChain ecosystem for chains and memory.

The following example demonstrates a LangChain ReAct agent with a search tool that follows the thought/action/observation loop:

# ReAct agent using [[langchain|LangChain]] with a search tool
from langchain_openai import ChatOpenAI
from langchain_community.tools import DuckDuckGoSearchRun
from [[langgraph|langgraph]].prebuilt import create_react_agent
 
llm = ChatOpenAI(model="gpt-4o")
tools = [DuckDuckGoSearchRun()]
 
# create_react_agent builds the thought-action-observation loop automatically
agent = create_react_agent(llm, tools)
 
# The agent reasons step-by-step, calling search as needed
result = agent.invoke(
    {"messages": [{"role": "user", "content": "Who founded SpaceX and when?"}]}
)
print(result["messages"][-1].content)

LlamaIndex implements ReAct-style agents for retrieval-augmented generation, where the agent reasons about which query engine or index to use, executes the retrieval, and reasons about the results before answering.

Production patterns commonly seen in ReAct deployments include:

Error recovery: Detecting failed actions and retrying with modified parameters
Multi-tool orchestration: Selecting among dozens of available tools based on reasoning
Confidence-based termination: Reasoning about whether enough evidence has been gathered
Human-in-the-loop: Surfacing reasoning traces for user review at decision points

By 2025, ReAct has evolved into a family of architectures incorporating multi-agent collaboration, vision capabilities (for temporal action detection), and hybrid approaches with reward models. It remains the most widely implemented agent reasoning pattern, though newer frameworks like ⁶⁾ Reflexion and planning-based approaches extend it with self-improvement and formal search.

References

¹⁾ , ²⁾ , ⁴⁾

Yao et al. - ReAct: Synergizing Reasoning and Acting in Language Models (2022

³⁾

Wei et al. - Chain-of-Thought Prompting Elicits Reasoning in Large Language Models (2022

⁵⁾

LangChain - How to Migrate from Legacy Agents to LangGraph (2024

⁶⁾

Shinn et al. - Reflexion: Language Agents with Verbal Reinforcement Learning (2023

AI Agent Knowledge Base

Sidebar

Table of Contents

ReAct: Reasoning and Acting

The Reasoning-Action Loop

Comparison with Chain-of-Thought and Action-Only Approaches

Tool Integration and Environment Design

Real-World Applications and Agent Systems

See Also

References

AI Agent Knowledge Base

User Tools

Site Tools

Sidebar

Table of Contents

ReAct: Reasoning and Acting

The Reasoning-Action Loop

Comparison with Chain-of-Thought and Action-Only Approaches

Tool Integration and Environment Design

Real-World Applications and Agent Systems

See Also

References

Page Tools