This is an old revision of the document!

Deep Search Agents

LLM-based Deep Search Agents represent a paradigm shift from static retrieval-augmented generation toward autonomous, multi-step information seeking with dynamic planning. As surveyed by Xi et al. (2025), these agents comprehend user intentions, execute multi-turn retrieval across diverse sources, and adaptively refine their search strategies – extending capabilities far beyond traditional web search or single-pass RAG systems. OpenAI's Deep Research exemplifies this paradigm in practice.

Background

The evolution of search follows three stages:

Traditional Web Search – Users manually select and consolidate results from ranked document lists
LLM-Enhanced Search – LLMs rewrite queries or summarize results in a single pass (basic RAG)
Search Agents – Autonomous agents control the entire search process with adaptive reasoning and dynamic retrieval

The critical limitation of LLM-enhanced search is its static, single-turn nature. Complex queries requiring multi-hop reasoning, cross-source synthesis, or iterative refinement cannot be handled by retrieve-once-then-generate pipelines.

Architecture

A deep search agent is formally defined as an LLM agent capable of:

Intent comprehension – Understanding the full scope of user information needs
Dynamic planning – Generating and revising multi-step search plans based on intermediate results
Multi-source retrieval – Searching across web, databases, APIs, private knowledge bases, and internal memory
Adaptive reasoning – Evaluating retrieval quality and adjusting strategy in real time

The agent operates as a sequential decision process. At each step $t$, the agent observes state $s_t$ (accumulated evidence and search history) and selects action $a_t$ from the action space:

$$a_t = \pi(s_t) \in \{\text{search}(q), \text{refine}(q), \text{synthesize}, \text{verify}, \text{terminate}\}$$

The policy $\pi$ is implemented by the LLM backbone, conditioned on the full trajectory $\tau_t = (s_0, a_0, \ldots, s_t)$.

Dynamic Planning

Unlike static pipelines, deep search agents employ planning with revision. Given a complex query $Q$, the agent generates an initial plan:

$$P_0 = \text{Decompose}(Q) = \{(q_1, \text{src}_1), \ldots, (q_k, \text{src}_k)\}$$

After executing sub-query $q_i$ and receiving documents $D_i$, the agent evaluates sufficiency:

$$\text{sufficient}(D_{1:i}, Q) = \begin{cases} \text{true} & \text{if } \text{coverage}(D_{1:i}, Q) \geq \theta \\ \text{false} & \text{otherwise} \end{cases}$$

On insufficiency, the agent revises the remaining plan: $P_{i+1} = \text{Revise}(P_i, D_{1:i}, Q)$, potentially adding new sub-queries, switching sources, or reformulating failed queries.

Multi-Hop Traversal

Deep search agents build knowledge chains through multi-hop traversal. Each hop uses results from previous hops as context:

$$q_{i+1} = \text{Generate}(Q, D_{1:i}, \text{gap}(D_{1:i}, Q))$$

where $\text{gap}(D_{1:i}, Q)$ identifies information still needed to fully answer $Q$. This enables complex reasoning patterns like entity identification followed by relation lookup followed by temporal filtering.

Code Example

from dataclasses import dataclass, field
 
@dataclass
class SearchState:
    query: str
    evidence: list = field(default_factory=list)
    plan: list = field(default_factory=list)
    history: list = field(default_factory=list)
 
class DeepSearchAgent:
    def __init__(self, llm, tools):
        self.llm = llm
        self.tools = tools  # {"web": web_search, "db": db_query, ...}
 
    def search(self, query, max_hops=10):
        state = SearchState(query=query)
        state.plan = self.decompose(query)
 
        for hop in range(max_hops):
            if not state.plan:
                break
            sub_query, source = state.plan.pop(0)
            docs = self.tools[source](sub_query)
            state.evidence.extend(docs)
            state.history.append((sub_query, source, docs))
 
            if self.is_sufficient(state):
                break
            state.plan = self.revise_plan(state)
 
        return self.synthesize(state)
 
    def decompose(self, query):
        return self.llm.generate(
            f"Decompose into sub-queries with sources: {query}"
        )
 
    def is_sufficient(self, state):
        score = self.llm.evaluate(state.query, state.evidence)
        return score >= 0.8
 
    def revise_plan(self, state):
        return self.llm.generate(f"Revise search plan given gaps: {state}")
 
    def synthesize(self, state):
        return self.llm.generate(
            f"Synthesize answer from evidence: {state.evidence}"
        )

Taxonomy of Approaches

The survey categorizes deep search agent architectures along several dimensions:

Dimension	Variants
Planning	Static decomposition, dynamic revision, hierarchical
Retrieval	Single-source, multi-source, tool-augmented
Reasoning	Single-agent, dual-agent (Reasoner-Purifier), multi-agent
Optimization	SFT, reinforcement learning, hybrid SFT+RL
Evaluation	Single-hop QA, multi-hop QA, open-ended research

Applications

Academic Research – Automated literature review and evidence synthesis
Market Intelligence – Multi-source competitive analysis with verification
Investigative Journalism – Cross-referencing claims across public records
Technical Support – Deep diagnostic search across documentation and logs

AI Agent Knowledge Base

Sidebar

Table of Contents

Deep Search Agents

Background

Architecture

Dynamic Planning

Multi-Hop Traversal

Code Example

Taxonomy of Approaches

Applications

References

See Also

AI Agent Knowledge Base

User Tools

Site Tools

Sidebar

Table of Contents

Deep Search Agents

Background

Architecture

Dynamic Planning

Multi-Hop Traversal

Code Example

Taxonomy of Approaches

Applications

References

See Also

Page Tools