Core Concepts
Reasoning
Memory & Retrieval
Agent Types
Design Patterns
Training & Alignment
Frameworks
Tools
Safety & Security
Evaluation
Meta
Core Concepts
Reasoning
Memory & Retrieval
Agent Types
Design Patterns
Training & Alignment
Frameworks
Tools
Safety & Security
Evaluation
Meta
This is an old revision of the document!
LLM-based Deep Search Agents represent a paradigm shift from static retrieval-augmented generation toward autonomous, multi-step information seeking with dynamic planning. As surveyed by Xi et al. (2025), these agents comprehend user intentions, execute multi-turn retrieval across diverse sources, and adaptively refine their search strategies – extending capabilities far beyond traditional web search or single-pass RAG systems. OpenAI's Deep Research exemplifies this paradigm in practice.
The evolution of search follows three stages:
The critical limitation of LLM-enhanced search is its static, single-turn nature. Complex queries requiring multi-hop reasoning, cross-source synthesis, or iterative refinement cannot be handled by retrieve-once-then-generate pipelines.
A deep search agent is formally defined as an LLM agent capable of:
The agent operates as a sequential decision process. At each step $t$, the agent observes state $s_t$ (accumulated evidence and search history) and selects action $a_t$ from the action space:
$$a_t = \pi(s_t) \in \{\text{search}(q), \text{refine}(q), \text{synthesize}, \text{verify}, \text{terminate}\}$$
The policy $\pi$ is implemented by the LLM backbone, conditioned on the full trajectory $\tau_t = (s_0, a_0, \ldots, s_t)$.
Unlike static pipelines, deep search agents employ planning with revision. Given a complex query $Q$, the agent generates an initial plan:
$$P_0 = \text{Decompose}(Q) = \{(q_1, \text{src}_1), \ldots, (q_k, \text{src}_k)\}$$
After executing sub-query $q_i$ and receiving documents $D_i$, the agent evaluates sufficiency:
$$\text{sufficient}(D_{1:i}, Q) = \begin{cases} \text{true} & \text{if } \text{coverage}(D_{1:i}, Q) \geq \theta \\ \text{false} & \text{otherwise} \end{cases}$$
On insufficiency, the agent revises the remaining plan: $P_{i+1} = \text{Revise}(P_i, D_{1:i}, Q)$, potentially adding new sub-queries, switching sources, or reformulating failed queries.
Deep search agents build knowledge chains through multi-hop traversal. Each hop uses results from previous hops as context:
$$q_{i+1} = \text{Generate}(Q, D_{1:i}, \text{gap}(D_{1:i}, Q))$$
where $\text{gap}(D_{1:i}, Q)$ identifies information still needed to fully answer $Q$. This enables complex reasoning patterns like entity identification followed by relation lookup followed by temporal filtering.
from dataclasses import dataclass, field @dataclass class SearchState: query: str evidence: list = field(default_factory=list) plan: list = field(default_factory=list) history: list = field(default_factory=list) class DeepSearchAgent: def __init__(self, llm, tools): self.llm = llm self.tools = tools # {"web": web_search, "db": db_query, ...} def search(self, query, max_hops=10): state = SearchState(query=query) state.plan = self.decompose(query) for hop in range(max_hops): if not state.plan: break sub_query, source = state.plan.pop(0) docs = self.tools[source](sub_query) state.evidence.extend(docs) state.history.append((sub_query, source, docs)) if self.is_sufficient(state): break state.plan = self.revise_plan(state) return self.synthesize(state) def decompose(self, query): return self.llm.generate( f"Decompose into sub-queries with sources: {query}" ) def is_sufficient(self, state): score = self.llm.evaluate(state.query, state.evidence) return score >= 0.8 def revise_plan(self, state): return self.llm.generate(f"Revise search plan given gaps: {state}") def synthesize(self, state): return self.llm.generate( f"Synthesize answer from evidence: {state.evidence}" )
The survey categorizes deep search agent architectures along several dimensions:
| Dimension | Variants |
| Planning | Static decomposition, dynamic revision, hierarchical |
| Retrieval | Single-source, multi-source, tool-augmented |
| Reasoning | Single-agent, dual-agent (Reasoner-Purifier), multi-agent |
| Optimization | SFT, reinforcement learning, hybrid SFT+RL |
| Evaluation | Single-hop QA, multi-hop QA, open-ended research |