Core Concepts
Reasoning
Memory & Retrieval
Agent Types
Design Patterns
Training & Alignment
Frameworks
Tools
Safety & Security
Evaluation
Meta
Core Concepts
Reasoning
Memory & Retrieval
Agent Types
Design Patterns
Training & Alignment
Frameworks
Tools
Safety & Security
Evaluation
Meta
Retrieval strategies determine how a RAG (Retrieval-Augmented Generation) system finds relevant documents to ground LLM responses. The choice of strategy significantly impacts answer quality, with dense methods capturing semantic similarity, sparse methods excelling at lexical matching, and advanced techniques like hybrids and reranking improving both precision and recall. 1)
Dense retrieval converts queries and documents into dense vector embeddings using models like MPNet, MiniLM, or OpenAI text-embedding-3, then retrieves top matches using similarity metrics such as cosine similarity. 2)
Strengths:
Weaknesses:
BM25 and TF-IDF are lexical methods that weight term frequency and document frequency for keyword-based ranking. They are fast, interpretable, and strong for precise terms but miss synonyms and semantic relationships. 3)
SPLADE is a learned sparse variant that uses term expansion to mimic dense retrieval semantics within a sparse vector space, balancing efficiency and performance.
Hybrid retrieval combines dense (vector embeddings) and sparse (BM25) search in parallel, fusing scores via algorithms like Reciprocal Rank Fusion (RRF) to boost both precision and recall. Benchmarks consistently show hybrids outperforming either method alone. 4)
After initial retrieval returns a candidate set (typically top-100 to top-1000), a cross-encoder reranking model re-scores candidates for finer relevance. This second stage prioritizes the most pertinent chunks while filtering noise, and is essential for scaling beyond basic search. 5)
Query expansion generates variants of the original query (synonyms, rewrites, alternative phrasings) to broaden retrieval coverage. RAG Fusion creates multiple query perspectives, runs parallel searches, and reranks the combined results for diverse, implicit knowledge discovery. 6)
HyDE uses an LLM to generate a hypothetical document that would answer the query, then embeds that hypothetical document and retrieves real documents matching its embedding. This bridges the semantic gap between short queries and long documents. 7)
Implementation pattern:
Expands a single query into multiple reformulations (via LLM rewriting), retrieves separately for each, and fuses results. This captures query ambiguity and different aspects of the user intent. 8)
Retrieves small, precise chunks but then expands to the parent document or larger context window post-retrieval, preserving full context while maintaining retrieval precision. 9)
Anthropic's contextual retrieval approach uses an LLM to infer key entities and namespaces from the query, then retrieves context-aware chunks tagged with this metadata. This improves specificity in large corpora where chunk-level retrieval alone may lack sufficient context. 10)
RRF merges ranked lists from multiple retrievers without requiring calibrated scores. The formula assigns each item a score based on its rank position:
RRF(d) = sum( 1 / (k + rank_i(d)) )
Where rank_i(d) is the rank of document d in retriever i, and k is a constant (typically 60). Items are sorted by descending RRF score. 11)
| Metric | Description | Use Case |
|---|---|---|
| Recall@k | Fraction of relevant documents in top-k retrieved | Retrieval completeness |
| MRR (Mean Reciprocal Rank) | Average of 1/rank of first relevant document | Single-answer ranking |
| NDCG (Normalized Discounted Cumulative Gain) | Rewards relevant documents higher in the list with position-based discounting | Overall ranking effectiveness |
Fine-tuning embeddings improves Recall@5; generator fine-tuning boosts exact-match and F1 scores. 12)
| Strategy | Strengths | Best For |
|---|---|---|
| Dense | Semantic matching, paraphrase handling | General Q&A |
| Sparse (BM25/TF-IDF) | Exact terms, fast, interpretable | Entity search, keyword-heavy domains |
| Hybrid | Best precision and recall via fusion | Production RAG systems |
| + Reranking | 10-20% recall gains | Ambiguous or long-tail queries |
| + HyDE / Multi-Query | Bridges query-document gap | Complex or vague queries |
| Graph RAG | Multi-hop reasoning | Knowledge graph applications |
Start with hybrid retrieval plus reranking for production systems. Use sparse search for keyword-heavy domains like legal or medical text. Add query expansion techniques for complex or ambiguous queries. 13)