Table of Contents

Retrieval Strategies

Retrieval strategies determine how a RAG (Retrieval-Augmented Generation) system finds relevant documents to ground LLM responses. The choice of strategy significantly impacts answer quality, with dense methods capturing semantic similarity, sparse methods excelling at lexical matching, and advanced techniques like hybrids and reranking improving both precision and recall. 1)

Dense Retrieval

Dense retrieval converts queries and documents into dense vector embeddings using models like MPNet, MiniLM, or OpenAI text-embedding-3, then retrieves top matches using similarity metrics such as cosine similarity. 2)

Strengths:

Weaknesses:

Sparse Retrieval

BM25 and TF-IDF are lexical methods that weight term frequency and document frequency for keyword-based ranking. They are fast, interpretable, and strong for precise terms but miss synonyms and semantic relationships. 3)

SPLADE is a learned sparse variant that uses term expansion to mimic dense retrieval semantics within a sparse vector space, balancing efficiency and performance.

Hybrid Retrieval

Hybrid retrieval combines dense (vector embeddings) and sparse (BM25) search in parallel, fusing scores via algorithms like Reciprocal Rank Fusion (RRF) to boost both precision and recall. Benchmarks consistently show hybrids outperforming either method alone. 4)

Reranking

After initial retrieval returns a candidate set (typically top-100 to top-1000), a cross-encoder reranking model re-scores candidates for finer relevance. This second stage prioritizes the most pertinent chunks while filtering noise, and is essential for scaling beyond basic search. 5)

Query Expansion

Query expansion generates variants of the original query (synonyms, rewrites, alternative phrasings) to broaden retrieval coverage. RAG Fusion creates multiple query perspectives, runs parallel searches, and reranks the combined results for diverse, implicit knowledge discovery. 6)

HyDE (Hypothetical Document Embeddings)

HyDE uses an LLM to generate a hypothetical document that would answer the query, then embeds that hypothetical document and retrieves real documents matching its embedding. This bridges the semantic gap between short queries and long documents. 7)

Implementation pattern:

  1. Prompt the LLM: “Write a document that would answer: [query]”
  2. Embed the generated hypothetical document
  3. Use the embedding to retrieve real documents via vector search

Multi-Query Retrieval

Expands a single query into multiple reformulations (via LLM rewriting), retrieves separately for each, and fuses results. This captures query ambiguity and different aspects of the user intent. 8)

Parent Document Retrieval

Retrieves small, precise chunks but then expands to the parent document or larger context window post-retrieval, preserving full context while maintaining retrieval precision. 9)

Contextual Retrieval

Anthropic's contextual retrieval approach uses an LLM to infer key entities and namespaces from the query, then retrieves context-aware chunks tagged with this metadata. This improves specificity in large corpora where chunk-level retrieval alone may lack sufficient context. 10)

Reciprocal Rank Fusion (RRF)

RRF merges ranked lists from multiple retrievers without requiring calibrated scores. The formula assigns each item a score based on its rank position:

RRF(d) = sum( 1 / (k + rank_i(d)) )

Where rank_i(d) is the rank of document d in retriever i, and k is a constant (typically 60). Items are sorted by descending RRF score. 11)

Evaluation Metrics

Metric Description Use Case
Recall@k Fraction of relevant documents in top-k retrieved Retrieval completeness
MRR (Mean Reciprocal Rank) Average of 1/rank of first relevant document Single-answer ranking
NDCG (Normalized Discounted Cumulative Gain) Rewards relevant documents higher in the list with position-based discounting Overall ranking effectiveness

Fine-tuning embeddings improves Recall@5; generator fine-tuning boosts exact-match and F1 scores. 12)

Strategy Selection Guide

Strategy Strengths Best For
Dense Semantic matching, paraphrase handling General Q&A
Sparse (BM25/TF-IDF) Exact terms, fast, interpretable Entity search, keyword-heavy domains
Hybrid Best precision and recall via fusion Production RAG systems
+ Reranking 10-20% recall gains Ambiguous or long-tail queries
+ HyDE / Multi-Query Bridges query-document gap Complex or vague queries
Graph RAG Multi-hop reasoning Knowledge graph applications

Start with hybrid retrieval plus reranking for production systems. Use sparse search for keyword-heavy domains like legal or medical text. Add query expansion techniques for complex or ambiguous queries. 13)

See Also

References

1)
https://www.youtube.com/watch?v=r0Dciuq0knU|YouTube: RAG Retrieval Strategies
2) , 12)
https://arxiv.org/html/2510.01600v1|arXiv: RAG Retrieval Evaluation
6) , 8)
https://wandb.ai/site/articles/rag-techniques/|Weights and Biases: RAG Techniques