Dense Retrieval
Sparse Retrieval
Hybrid Retrieval
Reranking
Query Expansion
HyDE (Hypothetical Document Embeddings)
Multi-Query Retrieval
Parent Document Retrieval
Contextual Retrieval
Reciprocal Rank Fusion (RRF)
Evaluation Metrics
Strategy Selection Guide
See Also
References

Retrieval Strategies

Retrieval strategies determine how a RAG (Retrieval-Augmented Generation) system finds relevant documents to ground LLM responses. The choice of strategy significantly impacts answer quality, with dense methods capturing semantic similarity, sparse methods excelling at lexical matching, and advanced techniques like hybrids and reranking improving both precision and recall. ¹⁾

Dense Retrieval

Dense retrieval converts queries and documents into dense vector embeddings using models like MPNet, MiniLM, or OpenAI text-embedding-3, then retrieves top matches using similarity metrics such as cosine similarity. ²⁾

Strengths:

Captures semantic meaning, handles paraphrases and synonyms
Works well for general question-answering

Weaknesses:

Can miss exact keyword matches
Vocabulary mismatch between query and documents

Sparse Retrieval

BM25 and TF-IDF are lexical methods that weight term frequency and document frequency for keyword-based ranking. They are fast, interpretable, and strong for precise terms but miss synonyms and semantic relationships. ³⁾

SPLADE is a learned sparse variant that uses term expansion to mimic dense retrieval semantics within a sparse vector space, balancing efficiency and performance.

Hybrid Retrieval

Hybrid retrieval combines dense (vector embeddings) and sparse (BM25) search in parallel, fusing scores via algorithms like Reciprocal Rank Fusion (RRF) to boost both precision and recall. Benchmarks consistently show hybrids outperforming either method alone. ⁴⁾

Reranking

After initial retrieval returns a candidate set (typically top-100 to top-1000), a cross-encoder reranking model re-scores candidates for finer relevance. This second stage prioritizes the most pertinent chunks while filtering noise, and is essential for scaling beyond basic search. ⁵⁾

Query Expansion

Query expansion generates variants of the original query (synonyms, rewrites, alternative phrasings) to broaden retrieval coverage. RAG Fusion creates multiple query perspectives, runs parallel searches, and reranks the combined results for diverse, implicit knowledge discovery. ⁶⁾

HyDE (Hypothetical Document Embeddings)

HyDE uses an LLM to generate a hypothetical document that would answer the query, then embeds that hypothetical document and retrieves real documents matching its embedding. This bridges the semantic gap between short queries and long documents. ⁷⁾

Implementation pattern:

Prompt the LLM: “Write a document that would answer: [query]”
Embed the generated hypothetical document
Use the embedding to retrieve real documents via vector search

Multi-Query Retrieval

Expands a single query into multiple reformulations (via LLM rewriting), retrieves separately for each, and fuses results. This captures query ambiguity and different aspects of the user intent. ⁸⁾

Parent Document Retrieval

Retrieves small, precise chunks but then expands to the parent document or larger context window post-retrieval, preserving full context while maintaining retrieval precision. ⁹⁾

Contextual Retrieval

Anthropic's contextual retrieval approach uses an LLM to infer key entities and namespaces from the query, then retrieves context-aware chunks tagged with this metadata. This improves specificity in large corpora where chunk-level retrieval alone may lack sufficient context. ¹⁰⁾

Reciprocal Rank Fusion (RRF)

RRF merges ranked lists from multiple retrievers without requiring calibrated scores. The formula assigns each item a score based on its rank position:

RRF(d) = sum( 1 / (k + rank_i(d)) )

Where rank_i(d) is the rank of document d in retriever i, and k is a constant (typically 60). Items are sorted by descending RRF score. ¹¹⁾

Evaluation Metrics

Metric	Description	Use Case
Recall@k	Fraction of relevant documents in top-k retrieved	Retrieval completeness
MRR (Mean Reciprocal Rank)	Average of 1/rank of first relevant document	Single-answer ranking
NDCG (Normalized Discounted Cumulative Gain)	Rewards relevant documents higher in the list with position-based discounting	Overall ranking effectiveness

Fine-tuning embeddings improves Recall@5; generator fine-tuning boosts exact-match and F1 scores. ¹²⁾

Strategy Selection Guide

Strategy	Strengths	Best For
Dense	Semantic matching, paraphrase handling	General Q&A
Sparse (BM25/TF-IDF)	Exact terms, fast, interpretable	Entity search, keyword-heavy domains
Hybrid	Best precision and recall via fusion	Production RAG systems
+ Reranking	10-20% recall gains	Ambiguous or long-tail queries
+ HyDE / Multi-Query	Bridges query-document gap	Complex or vague queries
Graph RAG	Multi-hop reasoning	Knowledge graph applications

Start with hybrid retrieval plus reranking for production systems. Use sparse search for keyword-heavy domains like legal or medical text. Add query expansion techniques for complex or ambiguous queries. ¹³⁾