Table of Contents

How to Build a Research Agent

A research agent is an AI system that autonomously searches the web, evaluates sources, chains multiple queries together (multi-hop retrieval), synthesizes findings, and produces cited reports. Systems like Perplexity, OpenAI Deep Research, and Google Gemini Deep Research demonstrate the power of this pattern. This guide covers the architecture and working code for building your own.1)2)3)

Architecture Overview

Research agents follow an iterative search-synthesize loop. Unlike simple RAG (retrieve once, generate once), a research agent dynamically generates follow-up queries based on what it has already found, evaluates source quality, and builds a comprehensive answer across multiple retrieval hops.

graph TD A[User Question] --> B[Query Planner] B --> C[Generate Sub-Questions] C --> D[Web Search Tool] D --> E[Source Evaluation] E --> F{Sufficient Evidence?} F -->|No| G[Generate Follow-up Queries] G --> D F -->|Yes| H[Synthesis Engine] H --> I[Citation Generator] I --> J[Final Report with Sources] E --> K[Source Ranking] K --> L[Dedup and Filter] L --> F

Core Components

1. Query Decomposition

Complex questions are broken into atomic sub-questions. “What is the best framework for building multi-agent systems in 2026?” becomes:4)

2. Web Search Integration

The agent needs a search tool that returns snippets with URLs. Options include SerpAPI, Tavily, Exa, Brave Search API, or Google Custom Search.

3. Source Evaluation

Not all search results are equal. The agent should score sources by:

4. Multi-Hop Retrieval

The PRISM pattern (Precision-Recall Iterative Selection Mechanism) uses three specialized sub-agents:5)

5. Synthesis and Citations

The final step combines all gathered evidence into a coherent answer with inline citations.

Approach 1: Pure Python Research Agent

A complete research agent using the OpenAI API and Tavily for web search.

import json, os
from openai import OpenAI
import requests
 
client = OpenAI()
MODEL = "gpt-4o"
TAVILY_API_KEY = os.environ["TAVILY_API_KEY"]
MAX_HOPS = 3
 
def web_search(query: str, max_results: int = 5) -> list[dict]:
    """Search the web using Tavily API."""
    response = requests.post(
        "https://api.tavily.com/search",
        json={
            "api_key": TAVILY_API_KEY,
            "query": query,
            "max_results": max_results,
            "include_raw_content": False,
        },
    )
    results = response.json().get("results", [])
    return [
        {"title": r["title"], "url": r["url"], "content": r["content"]}
        for r in results
    ]
 
def evaluate_sources(sources: list[dict], query: str) -> list[dict]:
    """Score and rank sources by relevance."""
    prompt = f"""Rate each source 1-10 for relevance to: {query}
    Return JSON array with url and score fields only.
    Sources: {json.dumps(sources)}"""
 
    response = client.chat.completions.create(
        model=MODEL,
        messages=[{"role": "user", "content": prompt}],
        response_format={"type": "json_object"},
    )
    scored = json.loads(response.choices[0].message.content)
    return sorted(scored.get("sources", scored.get("results", [])),
                  key=lambda x: x.get("score", 0), reverse=True)
 
TOOLS = [
    {
        "type": "function",
        "function": {
            "name": "search",
            "description": "Search the web for information on a topic.",
            "parameters": {
                "type": "object",
                "properties": {
                    "query": {"type": "string", "description": "Search query"}
                },
                "required": ["query"],
            },
        },
    }
]
 
def research(question: str) -> str:
    """Run multi-hop research on a question."""
    all_sources = []
    evidence = []
 
    messages = [
        {
            "role": "system",
            "content": (
                "You are a research agent. Search for information to answer "
                "the question thoroughly. Make multiple searches to cover "
                "different angles. When you have enough evidence, provide "
                "a comprehensive answer with citations [1], [2], etc."
            ),
        },
        {"role": "user", "content": question},
    ]
 
    for hop in range(MAX_HOPS * 3):  # Allow multiple searches per hop
        response = client.chat.completions.create(
            model=MODEL, messages=messages, tools=TOOLS
        )
        msg = response.choices[0].message
        messages.append(msg)
 
        if not msg.tool_calls:
            # Agent is done -- attach source list
            source_list = "\n".join(
                f"[{i+1}] {s['title']} - {s['url']}"
                for i, s in enumerate(all_sources)
            )
            return f"{msg.content}\n\nSources:\n{source_list}"
 
        for tc in msg.tool_calls:
            args = json.loads(tc.function.arguments)
            results = web_search(args["query"])
            all_sources.extend(results)
 
            content_block = "\n\n".join(
                f"[{r['title']}]({r['url']}): {r['content']}"
                for r in results
            )
            messages.append({
                "role": "tool",
                "tool_call_id": tc.id,
                "content": content_block,
            })
 
    return "Research incomplete -- max hops reached."
 
if __name__ == "__main__":
    report = research("What are the best patterns for building AI agents in 2026?")
    print(report)

Approach 2: LangGraph Research Agent

Using LangGraph for a stateful, multi-hop research workflow with explicit graph-based control flow.

import os, json, operator
from typing import Annotated, TypedDict
from langgraph.graph import StateGraph, END
from langchain_openai import ChatOpenAI
from langchain_community.tools.tavily_search import TavilySearchResults
 
llm = ChatOpenAI(model="gpt-4o")
search_tool = TavilySearchResults(max_results=5)
 
class ResearchState(TypedDict):
    question: str
    sub_questions: list[str]
    current_hop: int
    max_hops: int
    evidence: Annotated[list[dict], operator.add]
    sources: Annotated[list[dict], operator.add]
    answer: str
 
def decompose_question(state: ResearchState) -> ResearchState:
    """Break the question into sub-questions."""
    response = llm.invoke(
        f"Break this question into 3-5 specific sub-questions that "
        f"would help research a thorough answer:\n{state['question']}\n"
        f"Return as JSON array of strings."
    )
    subs = json.loads(response.content)
    return {"sub_questions": subs, "current_hop": 0}
 
def search_evidence(state: ResearchState) -> ResearchState:
    """Search for each sub-question."""
    new_evidence = []
    new_sources = []
    for sq in state["sub_questions"]:
        results = search_tool.invoke(sq)
        for r in results:
            new_evidence.append({"query": sq, "content": r["content"]})
            new_sources.append({"url": r["url"], "title": r.get("title", "")})
    return {
        "evidence": new_evidence,
        "sources": new_sources,
        "current_hop": state["current_hop"] + 1,
    }
 
def evaluate_completeness(state: ResearchState) -> str:
    """Decide if we have enough evidence or need more hops."""
    if state["current_hop"] >= state["max_hops"]:
        return "synthesize"
    response = llm.invoke(
        f"Question: {state['question']}\n"
        f"Evidence gathered: {len(state['evidence'])} items\n"
        f"Do we have enough evidence for a thorough answer? "
        f"Reply ONLY 'yes' or 'no'."
    )
    return "synthesize" if "yes" in response.content.lower() else "search_more"
 
def generate_followups(state: ResearchState) -> ResearchState:
    """Generate follow-up questions based on gaps."""
    evidence_summary = "\n".join(e["content"][:200] for e in state["evidence"][-5:])
    response = llm.invoke(
        f"Original question: {state['question']}\n"
        f"Evidence so far: {evidence_summary}\n"
        f"What 2-3 follow-up questions would fill gaps? JSON array of strings."
    )
    followups = json.loads(response.content)
    return {"sub_questions": followups}
 
def synthesize(state: ResearchState) -> ResearchState:
    """Produce final answer with citations."""
    evidence_text = "\n\n".join(
        f"Source {i+1}: {e['content']}" for i, e in enumerate(state["evidence"])
    )
    seen_urls = set()
    unique_sources = []
    for s in state["sources"]:
        if s["url"] not in seen_urls:
            seen_urls.add(s["url"])
            unique_sources.append(s)
 
    response = llm.invoke(
        f"Write a comprehensive answer to: {state['question']}\n\n"
        f"Use this evidence (cite as [1], [2], etc.):\n{evidence_text}\n\n"
        f"Be thorough, accurate, and well-structured."
    )
    source_list = "\n".join(
        f"[{i+1}] {s.get('title', 'Source')} - {s['url']}"
        for i, s in enumerate(unique_sources)
    )
    return {"answer": f"{response.content}\n\nSources:\n{source_list}"}
 
# Build the graph
workflow = StateGraph(ResearchState)
workflow.add_node("decompose", decompose_question)
workflow.add_node("search", search_evidence)
workflow.add_node("followup", generate_followups)
workflow.add_node("synthesize", synthesize)
 
workflow.set_entry_point("decompose")
workflow.add_edge("decompose", "search")
workflow.add_conditional_edges("search", evaluate_completeness, {
    "synthesize": "synthesize",
    "search_more": "followup",
})
workflow.add_edge("followup", "search")
workflow.add_edge("synthesize", END)
 
app = workflow.compile()
 
# Run research
result = app.invoke({
    "question": "What are the best agentic search patterns in 2026?",
    "sub_questions": [],
    "current_hop": 0,
    "max_hops": 3,
    "evidence": [],
    "sources": [],
    "answer": "",
})
print(result["answer"])

Comparison: Simple Loop vs LangGraph

Criteria Pure Python Loop LangGraph Graph
Control flow Implicit in LLM decisions Explicit graph edges
State management In-memory message list Typed state with reducers
Multi-hop control LLM decides when to stop Conditional edges with fallback
Checkpointing Must build custom Built-in persistence
Debugging Print statements Visual graph + tracing
Parallelism Manual threading Built-in parallel nodes
Best for Prototypes, simple queries Production, complex research

Agentic Search Patterns

Research from 2025-2026 has identified several effective patterns:

1. Iterative Deepening: Start broad, then narrow. First search gives an overview; follow-up searches target specific claims or gaps.

2. Multi-Source Triangulation: For factual claims, search across 3+ independent sources. Only include claims confirmed by multiple sources.

3. Temporal Filtering: For fast-moving topics (AI, policy), filter results by date. Information older than 6 months may be outdated.

4. Adversarial Verification: After forming an initial answer, search for counter-evidence. “Why might X be wrong?” queries catch biases.

5. PRISM Pattern: Separate precision (filtering noise) from recall (finding missing facts) into distinct agent roles. This produces compact yet comprehensive evidence sets.

Best Practices

See Also

References