====== Retrieval Strategies ====== Retrieval strategies determine how a RAG (Retrieval-Augmented Generation) system finds relevant documents to ground LLM responses. The choice of strategy significantly impacts answer quality, with dense methods capturing semantic similarity, sparse methods excelling at lexical matching, and advanced techniques like hybrids and reranking improving both precision and recall. ((https://www.youtube.com/watch?v=r0Dciuq0knU|YouTube: RAG Retrieval Strategies)) ===== Dense Retrieval ===== Dense retrieval converts queries and documents into dense vector embeddings using models like MPNet, MiniLM, or OpenAI text-embedding-3, then retrieves top matches using similarity metrics such as cosine similarity. ((https://arxiv.org/html/2510.01600v1|arXiv: RAG Retrieval Evaluation)) Strengths: * Captures semantic meaning, handles paraphrases and synonyms * Works well for general question-answering Weaknesses: * Can miss exact keyword matches * Vocabulary mismatch between query and documents ===== Sparse Retrieval ===== **BM25** and **TF-IDF** are lexical methods that weight term frequency and document frequency for keyword-based ranking. They are fast, interpretable, and strong for precise terms but miss synonyms and semantic relationships. ((https://www.youtube.com/watch?v=r0Dciuq0knU|YouTube: RAG Retrieval)) **SPLADE** is a learned sparse variant that uses term expansion to mimic dense retrieval semantics within a sparse vector space, balancing efficiency and performance. ===== Hybrid Retrieval ===== Hybrid retrieval combines dense (vector embeddings) and sparse (BM25) search in parallel, fusing scores via algorithms like Reciprocal Rank Fusion (RRF) to boost both precision and recall. Benchmarks consistently show hybrids outperforming either method alone. ((https://www.puppygraph.com/blog/rag-techniques|PuppyGraph: RAG Techniques)) ===== Reranking ===== After initial retrieval returns a candidate set (typically top-100 to top-1000), a cross-encoder reranking model re-scores candidates for finer relevance. This second stage prioritizes the most pertinent chunks while filtering noise, and is essential for scaling beyond basic search. ((https://www.puppygraph.com/blog/rag-techniques|PuppyGraph: RAG Techniques)) ===== Query Expansion ===== Query expansion generates variants of the original query (synonyms, rewrites, alternative phrasings) to broaden retrieval coverage. **RAG Fusion** creates multiple query perspectives, runs parallel searches, and reranks the combined results for diverse, implicit knowledge discovery. ((https://wandb.ai/site/articles/rag-techniques/|Weights and Biases: RAG Techniques)) ===== HyDE (Hypothetical Document Embeddings) ===== HyDE uses an LLM to generate a hypothetical document that would answer the query, then embeds that hypothetical document and retrieves real documents matching its embedding. This bridges the semantic gap between short queries and long documents. ((https://www.meilisearch.com/blog/rag-techniques|Meilisearch: RAG Techniques)) Implementation pattern: - Prompt the LLM: "Write a document that would answer: [query]" - Embed the generated hypothetical document - Use the embedding to retrieve real documents via vector search ===== Multi-Query Retrieval ===== Expands a single query into multiple reformulations (via LLM rewriting), retrieves separately for each, and fuses results. This captures query ambiguity and different aspects of the user intent. ((https://wandb.ai/site/articles/rag-techniques/|Weights and Biases: RAG Techniques)) ===== Parent Document Retrieval ===== Retrieves small, precise chunks but then expands to the parent document or larger context window post-retrieval, preserving full context while maintaining retrieval precision. ((https://www.meilisearch.com/blog/rag-techniques|Meilisearch: RAG Techniques)) ===== Contextual Retrieval ===== Anthropic's contextual retrieval approach uses an LLM to infer key entities and namespaces from the query, then retrieves context-aware chunks tagged with this metadata. This improves specificity in large corpora where chunk-level retrieval alone may lack sufficient context. ((https://www.meilisearch.com/blog/rag-techniques|Meilisearch: RAG Techniques)) ===== Reciprocal Rank Fusion (RRF) ===== RRF merges ranked lists from multiple retrievers without requiring calibrated scores. The formula assigns each item a score based on its rank position: RRF(d) = sum( 1 / (k + rank_i(d)) ) Where rank_i(d) is the rank of document d in retriever i, and k is a constant (typically 60). Items are sorted by descending RRF score. ((https://www.puppygraph.com/blog/rag-techniques|PuppyGraph: RAG Techniques)) ===== Evaluation Metrics ===== ^ Metric ^ Description ^ Use Case ^ | Recall@k | Fraction of relevant documents in top-k retrieved | Retrieval completeness | | MRR (Mean Reciprocal Rank) | Average of 1/rank of first relevant document | Single-answer ranking | | NDCG (Normalized Discounted Cumulative Gain) | Rewards relevant documents higher in the list with position-based discounting | Overall ranking effectiveness | Fine-tuning embeddings improves Recall@5; generator fine-tuning boosts exact-match and F1 scores. ((https://arxiv.org/html/2510.01600v1|arXiv: RAG Retrieval Evaluation)) ===== Strategy Selection Guide ===== ^ Strategy ^ Strengths ^ Best For ^ | Dense | Semantic matching, paraphrase handling | General Q&A | | Sparse (BM25/TF-IDF) | Exact terms, fast, interpretable | Entity search, keyword-heavy domains | | Hybrid | Best precision and recall via fusion | Production RAG systems | | + Reranking | 10-20% recall gains | Ambiguous or long-tail queries | | + HyDE / Multi-Query | Bridges query-document gap | Complex or vague queries | | Graph RAG | Multi-hop reasoning | Knowledge graph applications | Start with hybrid retrieval plus reranking for production systems. Use sparse search for keyword-heavy domains like legal or medical text. Add query expansion techniques for complex or ambiguous queries. ((https://www.puppygraph.com/blog/rag-techniques|PuppyGraph: RAG Techniques)) ===== See Also ===== * [[hybrid_search|Hybrid Search]] * [[reranking|Reranking]] * [[chunking_strategies|Chunking Strategies]] * [[semantic_search|Semantic Search]] * [[embedding_models_comparison|Embedding Models Comparison]] ===== References =====