====== Plan and Execute Agents ======

Plan-and-execute agents separate the planning phase from the execution phase, first generating a complete step-by-step plan and then carrying out each step individually.((https://arxiv.org/abs/2305.04091|Wang, L. et al. "Plan-and-Solve Prompting: Improving Zero-Shot Chain-of-Thought Reasoning by Large Language Models.")) This two-phase architecture addresses limitations of purely reactive agents like [[react_agents|ReAct]] by providing a structured roadmap before any actions are taken. The approach improves reliability on complex multi-step tasks by reducing compounding errors that arise from step-by-step improvisation.

===== Architecture =====

The plan-and-execute pattern consists of two core components operating in sequence:

  * **Planner**: An LLM that receives the user's objective and decomposes it into an explicit, ordered sequence of steps. The planner considers dependencies between steps and the tools available for execution.
  * **Executor**: A separate agent (often a [[react_agents|ReAct-style]] agent) that carries out each step, invoking [[tool_using_agents|tools]] as needed and reporting results back.

State management persists the plan, current step index, and accumulated results, enabling resumability if execution is interrupted. This separation of concerns allows different models or configurations for planning versus execution, for example, using a more capable model for planning and a faster model for routine execution steps.((https://python.langchain.com/docs/how_to/|LangChain Documentation))

===== Python Example =====

<code python>
from [[langgraph|langgraph]].graph import StateGraph, START, END
from langchain_openai import ChatOpenAI
from typing import TypedDict, Annotated
import operator

class PlanExecuteState(TypedDict):
    objective: str
    plan: list[str]
    current_step: int
    results: Annotated[list[str], operator.add]
    final_answer: str

planner_llm = ChatOpenAI(model="gpt-4o")
executor_llm = ChatOpenAI(model="gpt-4o-mini")

def plan_step(state: PlanExecuteState) -> dict:
    resp = planner_llm.invoke(
        f"Break this objective into 3-5 concrete steps:\n{state['objective']}\n"
        "Return each step on a new line, numbered."
    )
    steps = [line.strip() for line in resp.content.split("\n") if line.strip()]
    return {"plan": steps, "current_step": 0}

def execute_step(state: PlanExecuteState) -> dict:
    idx = state["current_step"]
    step = state["plan"][idx]
    context = "\n".join(state["results"]) if state["results"] else "No prior results."
    resp = executor_llm.invoke(
        f"Execute this step: {step}\nPrior results:\n{context}"
    )
    return {"results": [resp.content], "current_step": idx + 1}

def should_continue(state: PlanExecuteState) -> str:
    return "execute" if state["current_step"] < len(state["plan"]) else "finalize"

def finalize(state: PlanExecuteState) -> dict:
    all_results = "\n".join(state["results"])
    resp = planner_llm.invoke(f"Synthesize these results:\n{all_results}")
    return {"final_answer": resp.content}

# Build the [[langgraph|LangGraph]] state machine
graph = StateGraph(PlanExecuteState)
graph.add_node("plan", plan_step)
graph.add_node("execute", execute_step)
graph.add_node("finalize", finalize)
graph.add_edge(START, "plan")
graph.add_edge("plan", "execute")
graph.add_conditional_edges("execute", should_continue)
graph.add_edge("finalize", END)

app = graph.compile()
result = app.invoke({"objective": "Research and summarize HNSW algorithms"})
print(result["final_answer"])
</code>

===== BabyAGI-Style Planning =====

[[babyagi|BabyAGI]] pioneered a dynamic variant of plan-and-execute where the task queue is continuously regenerated.((https://github.com/yoheinakajima/babyagi|GitHub: yoheinakajima/babyagi)) Rather than creating a fixed plan upfront, BabyAGI's task creation agent generates new tasks based on execution results, and a prioritization agent reorders them. This produces adaptive planning that evolves with the task, combining the structure of plan-and-execute with some of the flexibility of [[react_agents|ReAct]].

Modern frameworks have adopted this pattern with hierarchical supervisors for multi-level [[task_decomposition|task decomposition]], where high-level goals are broken into sub-plans, each managed by specialized agents.

===== LangChain PlanAndExecute =====

LangChain implements plan-and-execute through **[[langgraph|LangGraph]]'s stateful graphs**, providing:((https://python.langchain.com/docs/how_to/|LangChain Documentation))

  * **Structured Workflows**: Nodes represent planning and execution stages, connected by edges that manage state transitions
  * **Dynamic Replanning**: Feedback loops allow the executor to report failures to the planner, triggering plan revision mid-execution
  * **Tool Integration**: Executors access shared tool registries, RAG systems, and external APIs
  * **[[human_in_the_loop|Human-in-the-Loop]]**: Checkpoint nodes where humans can review and modify the plan before execution continues

===== Comparison to ReAct =====

Plan-and-execute and [[react_agents|ReAct]] represent fundamentally different strategies for agent decision-making:((https://arxiv.org/abs/2210.03629|Yao, S. et al. "ReAct: Synergizing Reasoning and Acting in Language Models."))

^ Aspect ^ Plan-and-Execute ^ ReAct ^
| Process | Upfront full plan, then sequential execution | Iterative reason-act cycles per step |
| LLM Calls | Fewer (planning happens once) | More (continuous reasoning at each step) |
| Token Usage | 3,000-4,500 per task | 2,000-3,000 per task |
| Task Accuracy | ~92% on complex predictable tasks | ~85% on similar tasks |
| Strengths | Predictable, cost-efficient for structured tasks, visible progress | Adaptive, flexible, lower overhead for simple tasks |
| Weaknesses | Plans can become stale if environment changes | Higher cumulative cost, less structured |
| Security | Better isolation (control flow separated from execution) | More exposed to prompt injection during reasoning |

In practice, hybrid approaches dominate: plan-and-execute for the overall structure with ReAct-style execution within individual steps, combining strategic planning with tactical flexibility.(((https://arxiv.org/abs/2305.10601|Yao, S. et al. "Tree of Thoughts: Deliberate Problem Solving with Large Language Models." arXiv:2305.10601, 2023.)))

===== Dynamic Replanning =====

Static plans fail when the environment changes or unexpected results occur. Modern plan-and-execute agents incorporate feedback loops for mid-execution adaptation:

  * **Execution Feedback**: Executors report step outcomes, including failures, to the planner
  * **Plan Revision**: The planner generates an updated plan incorporating new information
  * **Selective Replanning**: Only affected downstream steps are revised, preserving completed work(((https://arxiv.org/abs/2303.11366|Shinn, N. et al. "Reflexion: Language Agents with Verbal Reinforcement Learning." arXiv:2303.11366, 2023.)))
  * **Iteration Limits**: Maximum replan counts prevent infinite loops

[[langgraph|LangGraph]] implements this through graph cycles where execution nodes can route back to planning nodes based on conditional logic.

===== Hierarchical Planning =====

Complex tasks benefit from multi-level planning hierarchies:((https://arxiv.org/abs/2308.11432|Wang, L. et al. "A Survey on Large Language Model based Autonomous Agents."))

  * **Supervisor Agents**: High-level planners that decompose objectives into major phases
  * **Mid-Tier Planners**: Create detailed step sequences for each phase
  * **Worker Agents**: Execute individual steps using tools and report results upward

Patterns include hub-and-spoke (central orchestrator), pipeline (sequential handoffs), and peer-to-peer (collaborative decomposition). These hierarchies scale plan-and-execute to handle workflows that would overwhelm a single planning step.

===== Research and Production Trends =====

Late 2025 research emphasizes production-grade plan-and-execute with:

  * Parallel DAG execution for independent plan steps
  * [[human_in_the_loop|Human-in-the-loop]] verification at critical decision points
  * Security through separated planning and execution contexts
  * [[process_reward_models|Process reward models]] for evaluating plan quality
  * Frameworks like [[langgraph|LangGraph]], [[crewai|CrewAI]], and [[autogen|AutoGen]] providing built-in plan-and-execute primitives

===== See Also =====

  * [[planning|Agent Planning: How AI Agents Plan and Reason]]
  * [[multi_agent_architecture|Multi-Agent Architecture (Planner-Generator-Evaluator)]]
  * [[long_horizon_agents|Long-Horizon Agents]]
  * [[parallel_agents|Parallel Agent Execution]]
  * [[agentic_workflows|Agentic Workflows]]

===== References =====