Core Concepts
Reasoning
Memory & Retrieval
Agent Types
Design Patterns
Training & Alignment
Frameworks
Tools
Safety & Security
Evaluation
Meta
Core Concepts
Reasoning
Memory & Retrieval
Agent Types
Design Patterns
Training & Alignment
Frameworks
Tools
Safety & Security
Evaluation
Meta
This is an old revision of the document!
A research agent is an AI system that autonomously searches the web, evaluates sources, chains multiple queries together (multi-hop retrieval), synthesizes findings, and produces cited reports. Systems like Perplexity, OpenAI Deep Research, and Google Gemini Deep Research demonstrate the power of this pattern. This guide covers the architecture and working code for building your own.
Research agents follow an iterative search-synthesize loop. Unlike simple RAG (retrieve once, generate once), a research agent dynamically generates follow-up queries based on what it has already found, evaluates source quality, and builds a comprehensive answer across multiple retrieval hops.
Complex questions are broken into atomic sub-questions. “What is the best framework for building multi-agent systems in 2026?” becomes:
The agent needs a search tool that returns snippets with URLs. Options include SerpAPI, Tavily, Exa, Brave Search API, or Google Custom Search.
Not all search results are equal. The agent should score sources by:
The PRISM pattern (Precision-Recall Iterative Selection Mechanism) uses three specialized sub-agents:
The final step combines all gathered evidence into a coherent answer with inline citations.
A complete research agent using the OpenAI API and Tavily for web search.
import json, os from openai import OpenAI import requests client = OpenAI() MODEL = "gpt-4o" TAVILY_API_KEY = os.environ["TAVILY_API_KEY"] MAX_HOPS = 3 def web_search(query: str, max_results: int = 5) -> list[dict]: """Search the web using Tavily API.""" response = requests.post( "https://api.tavily.com/search", json={ "api_key": TAVILY_API_KEY, "query": query, "max_results": max_results, "include_raw_content": False, }, ) results = response.json().get("results", []) return [ {"title": r["title"], "url": r["url"], "content": r["content"]} for r in results ] def evaluate_sources(sources: list[dict], query: str) -> list[dict]: """Score and rank sources by relevance.""" prompt = f"""Rate each source 1-10 for relevance to: {query} Return JSON array with url and score fields only. Sources: {json.dumps(sources)}""" response = client.chat.completions.create( model=MODEL, messages=[{"role": "user", "content": prompt}], response_format={"type": "json_object"}, ) scored = json.loads(response.choices[0].message.content) return sorted(scored.get("sources", scored.get("results", [])), key=lambda x: x.get("score", 0), reverse=True) TOOLS = [ { "type": "function", "function": { "name": "search", "description": "Search the web for information on a topic.", "parameters": { "type": "object", "properties": { "query": {"type": "string", "description": "Search query"} }, "required": ["query"], }, }, } ] def research(question: str) -> str: """Run multi-hop research on a question.""" all_sources = [] evidence = [] messages = [ { "role": "system", "content": ( "You are a research agent. Search for information to answer " "the question thoroughly. Make multiple searches to cover " "different angles. When you have enough evidence, provide " "a comprehensive answer with citations [1], [2], etc." ), }, {"role": "user", "content": question}, ] for hop in range(MAX_HOPS * 3): # Allow multiple searches per hop response = client.chat.completions.create( model=MODEL, messages=messages, tools=TOOLS ) msg = response.choices[0].message messages.append(msg) if not msg.tool_calls: # Agent is done -- attach source list source_list = "\n".join( f"[{i+1}] {s['title']} - {s['url']}" for i, s in enumerate(all_sources) ) return f"{msg.content}\n\nSources:\n{source_list}" for tc in msg.tool_calls: args = json.loads(tc.function.arguments) results = web_search(args["query"]) all_sources.extend(results) content_block = "\n\n".join( f"[{r['title']}]({r['url']}): {r['content']}" for r in results ) messages.append({ "role": "tool", "tool_call_id": tc.id, "content": content_block, }) return "Research incomplete -- max hops reached." if __name__ == "__main__": report = research("What are the best patterns for building AI agents in 2026?") print(report)
Using LangGraph for a stateful, multi-hop research workflow with explicit graph-based control flow.
import os, json, operator from typing import Annotated, TypedDict from langgraph.graph import StateGraph, END from langchain_openai import ChatOpenAI from langchain_community.tools.tavily_search import TavilySearchResults llm = ChatOpenAI(model="gpt-4o") search_tool = TavilySearchResults(max_results=5) class ResearchState(TypedDict): question: str sub_questions: list[str] current_hop: int max_hops: int evidence: Annotated[list[dict], operator.add] sources: Annotated[list[dict], operator.add] answer: str def decompose_question(state: ResearchState) -> ResearchState: """Break the question into sub-questions.""" response = llm.invoke( f"Break this question into 3-5 specific sub-questions that " f"would help research a thorough answer:\n{state['question']}\n" f"Return as JSON array of strings." ) subs = json.loads(response.content) return {"sub_questions": subs, "current_hop": 0} def search_evidence(state: ResearchState) -> ResearchState: """Search for each sub-question.""" new_evidence = [] new_sources = [] for sq in state["sub_questions"]: results = search_tool.invoke(sq) for r in results: new_evidence.append({"query": sq, "content": r["content"]}) new_sources.append({"url": r["url"], "title": r.get("title", "")}) return { "evidence": new_evidence, "sources": new_sources, "current_hop": state["current_hop"] + 1, } def evaluate_completeness(state: ResearchState) -> str: """Decide if we have enough evidence or need more hops.""" if state["current_hop"] >= state["max_hops"]: return "synthesize" response = llm.invoke( f"Question: {state['question']}\n" f"Evidence gathered: {len(state['evidence'])} items\n" f"Do we have enough evidence for a thorough answer? " f"Reply ONLY 'yes' or 'no'." ) return "synthesize" if "yes" in response.content.lower() else "search_more" def generate_followups(state: ResearchState) -> ResearchState: """Generate follow-up questions based on gaps.""" evidence_summary = "\n".join(e["content"][:200] for e in state["evidence"][-5:]) response = llm.invoke( f"Original question: {state['question']}\n" f"Evidence so far: {evidence_summary}\n" f"What 2-3 follow-up questions would fill gaps? JSON array of strings." ) followups = json.loads(response.content) return {"sub_questions": followups} def synthesize(state: ResearchState) -> ResearchState: """Produce final answer with citations.""" evidence_text = "\n\n".join( f"Source {i+1}: {e['content']}" for i, e in enumerate(state["evidence"]) ) seen_urls = set() unique_sources = [] for s in state["sources"]: if s["url"] not in seen_urls: seen_urls.add(s["url"]) unique_sources.append(s) response = llm.invoke( f"Write a comprehensive answer to: {state['question']}\n\n" f"Use this evidence (cite as [1], [2], etc.):\n{evidence_text}\n\n" f"Be thorough, accurate, and well-structured." ) source_list = "\n".join( f"[{i+1}] {s.get('title', 'Source')} - {s['url']}" for i, s in enumerate(unique_sources) ) return {"answer": f"{response.content}\n\nSources:\n{source_list}"} # Build the graph workflow = StateGraph(ResearchState) workflow.add_node("decompose", decompose_question) workflow.add_node("search", search_evidence) workflow.add_node("followup", generate_followups) workflow.add_node("synthesize", synthesize) workflow.set_entry_point("decompose") workflow.add_edge("decompose", "search") workflow.add_conditional_edges("search", evaluate_completeness, { "synthesize": "synthesize", "search_more": "followup", }) workflow.add_edge("followup", "search") workflow.add_edge("synthesize", END) app = workflow.compile() # Run research result = app.invoke({ "question": "What are the best agentic search patterns in 2026?", "sub_questions": [], "current_hop": 0, "max_hops": 3, "evidence": [], "sources": [], "answer": "", }) print(result["answer"])
| Criteria | Pure Python Loop | LangGraph Graph |
|---|---|---|
| Control flow | Implicit in LLM decisions | Explicit graph edges |
| State management | In-memory message list | Typed state with reducers |
| Multi-hop control | LLM decides when to stop | Conditional edges with fallback |
| Checkpointing | Must build custom | Built-in persistence |
| Debugging | Print statements | Visual graph + tracing |
| Parallelism | Manual threading | Built-in parallel nodes |
| Best for | Prototypes, simple queries | Production, complex research |
Research from 2025-2026 has identified several effective patterns:
1. Iterative Deepening: Start broad, then narrow. First search gives an overview; follow-up searches target specific claims or gaps.
2. Multi-Source Triangulation: For factual claims, search across 3+ independent sources. Only include claims confirmed by multiple sources.
3. Temporal Filtering: For fast-moving topics (AI, policy), filter results by date. Information older than 6 months may be outdated.
4. Adversarial Verification: After forming an initial answer, search for counter-evidence. “Why might X be wrong?” queries catch biases.
5. PRISM Pattern: Separate precision (filtering noise) from recall (finding missing facts) into distinct agent roles. This produces compact yet comprehensive evidence sets.