====== ReAct: Reasoning and Acting ====== ReAct is a prompting framework that synergizes reasoning (chain-of-thought style) and acting (tool use and environment interaction) within large language models.(([[https://arxiv.org/abs/2210.03629|Yao et al. - ReAct: Synergizing Reasoning and Acting in Language Models (2022]])) Proposed by [[https://arxiv.org/abs/2210.03629|Yao et al., 2022]] in "ReAct: Synergizing Reasoning and Acting in Language Models," ReAct interleaves the generation of reasoning traces with task-specific actions, allowing the model to dynamically plan, retrieve information, and adjust its approach based on observations from the environment. This tight coupling of thought and action has proven effective for tasks such as question answering, fact verification, and interactive decision-making, and has become a foundational pattern for LLM agent architectures. graph LR Q[Question] --> T1[Thought 1] T1 --> A1[Action 1] A1 --> O1[Observation 1] O1 --> T2[Thought 2] T2 --> A2[Action 2] A2 --> O2[Observation 2] O2 --> T3[Thought 3] T3 --> Ans[Final Answer] style T1 fill:#e1f5fe style T2 fill:#e1f5fe style T3 fill:#e1f5fe style A1 fill:#fff3e0 style A2 fill:#fff3e0 style O1 fill:#e8f5e9 style O2 fill:#e8f5e9 ===== The Reasoning-Action Loop ===== ReAct operates through an iterative Thought-Action-Observation cycle:(([[https://arxiv.org/abs/2210.03629|Yao et al. - ReAct: Synergizing Reasoning and Acting in Language Models (2022]])) * **Thought (t_t)**: The model generates a natural language reasoning trace that analyzes the current situation, tracks progress toward the goal, plans the next step, or handles exceptions. For example: "The search returned the founding year but not the founder's name. I should search for the founder specifically." * **Action (a_t)**: Based on the reasoning, the model selects and executes a concrete action from the available action space, such as ''search[query]'', ''lookup[term]'', or ''finish[answer]''. * **Observation (o_t)**: The environment returns the result of the action (e.g., a search snippet, a page lookup result), which becomes part of the context for the next iteration. This loop repeats until the model decides to terminate (e.g., by calling ''finish[answer]''). The full trajectory of thoughts, actions, and observations is maintained in the prompt context, giving the model a working memory of its reasoning and actions so far. The key innovation is that **thoughts and actions are interleaved**, not separated. The reasoning traces ground the model's decisions in explicit logic, while the actions ground the reasoning in real-world information, reducing hallucination. ===== Comparison with Chain-of-Thought and Action-Only Approaches ===== | **Approach** | **Mechanism** | **Strengths** | **Weaknesses** | | Chain-of-Thought | Pure internal reasoning without external actions | Strong on knowledge tasks; no environment needed(([[https://arxiv.org/abs/2201.11903|Wei et al. - Chain-of-Thought Prompting Elicits Reasoning in Large Language Models (2022]])) | No external grounding; hallucinations on factual queries | | Action-Only | Direct tool calls without explicit reasoning | Low overhead; fast execution | Opaque decisions; poor error recovery | | ReAct | Interleaved reasoning + actions | Interpretable; robust to failures; self-correcting | Higher token cost; potential reasoning loops | On **HotpotQA** (multi-hop question answering), ReAct with PaLM-540B achieved **41% exact match**, compared to 37% for CoT alone.(([[https://arxiv.org/abs/2210.03629|Yao et al. - ReAct: Synergizing Reasoning and Acting in Language Models (2022]])) The reasoning traces allowed the model to decompose multi-hop questions and correct course when initial searches returned irrelevant results. On **Fever** (fact verification), ReAct similarly outperformed both reasoning-only and action-only baselines by using search actions to verify claims and reasoning to synthesize evidence. On **ALFWorld** (text-based household tasks), ReAct completed 30-40% of episodes, far exceeding imitation learning baselines (~10%), by reasoning through environment feedback to plan multi-step actions like finding and cleaning objects. ===== Tool Integration and Environment Design ===== ReAct's effectiveness depends on the design of the action space. In the original paper, actions were simple text commands (''search'', ''lookup'', ''finish'') interfacing with Wikipedia. In practice, the action space can include: * **Search APIs**: Web search, document retrieval, database queries * **Code execution**: Running Python code, SQL queries, shell commands * **External tools**: Calculators, calendars, translation services * **Environment actions**: Navigation, manipulation in simulated environments The action space should be well-defined with clear semantics, as ambiguous action definitions lead to the model making incorrect tool calls. Each tool should return structured observations that the model can reason about in subsequent thoughts. ===== Real-World Applications and Agent Systems ===== ReAct has become the dominant pattern for LLM agent implementations: **LangChain**(([[https://python.langchain.com/docs/how_to/migrate_agent/|LangChain - How to Migrate from Legacy Agents to LangGraph (2024]])) provides ''create_react_agent'' which parses LLM outputs for thought/action pairs and manages the execution loop. It supports custom tools and integrates with the broader LangChain ecosystem for chains and memory. The following example demonstrates a [[langchain|LangChain]] ReAct agent with a search tool that follows the thought/action/observation loop: # ReAct agent using [[langchain|LangChain]] with a search tool from langchain_openai import ChatOpenAI from langchain_community.tools import DuckDuckGoSearchRun from [[langgraph|langgraph]].prebuilt import create_react_agent llm = ChatOpenAI(model="gpt-4o") tools = [DuckDuckGoSearchRun()] # create_react_agent builds the thought-action-observation loop automatically agent = create_react_agent(llm, tools) # The agent reasons step-by-step, calling search as needed result = agent.invoke( {"messages": [{"role": "user", "content": "Who founded SpaceX and when?"}]} ) print(result["messages"][-1].content) **[[llamaindex|LlamaIndex]]** implements ReAct-style agents for retrieval-augmented generation, where the agent reasons about which query engine or index to use, executes the retrieval, and reasons about the results before answering. **Production patterns** commonly seen in ReAct deployments include: * Error recovery: Detecting failed actions and retrying with modified parameters * Multi-tool orchestration: Selecting among dozens of available tools based on reasoning * Confidence-based termination: Reasoning about whether enough evidence has been gathered * [[human_in_the_loop|Human-in-the-loop]]: Surfacing reasoning traces for user review at decision points By 2025, ReAct has evolved into a family of architectures incorporating multi-agent collaboration, vision capabilities (for temporal action detection), and hybrid approaches with reward models. It remains the most widely implemented agent reasoning pattern, though newer frameworks like (([[https://arxiv.org/abs/2303.11366|Shinn et al. - Reflexion: Language Agents with Verbal Reinforcement Learning (2023]])) [[reflexion_framework|Reflexion]] and planning-based approaches extend it with self-improvement and formal search. ===== See Also ===== * [[react_agents|ReAct Agents]] * [[reasoning_via_planning|RAP: Reasoning via Planning with LLM as World Model]] * [[reasoning_models|Reasoning Models]] * [[react_19|React 19]] * [[active_prompt|Active-Prompt]] ===== References =====