๐ Today's Brief
Browse
Core Concepts
Reasoning
Memory & Retrieval
Agent Types
Design Patterns
Training & Alignment
Frameworks
Tools
Safety
Meta
๐ Today's Brief
Browse
Core Concepts
Reasoning
Memory & Retrieval
Agent Types
Design Patterns
Training & Alignment
Frameworks
Tools
Safety
Meta
Retrieval Augmented Generation (RAG) enhances large language models by retrieving relevant external documents at query time, grounding responses in factual, up-to-date information without retraining the model.1) RAG addresses core LLM limitations โ hallucinations, outdated knowledge, and lack of domain-specific data โ making it the most widely deployed pattern for production AI agent systems.2) A key challenge in production RAG systems is that they frequently return incorrect answers with high confidence, requiring careful evaluation and feedback loops to ensure reliability.3) When RAG data becomes massive, noisy, or contradictory in ways that break coherent context windows, this limitation can justify transitioning to multi-agent systems designed to filter and structure the retrieved information before generation.4)
RAG operates in three stages:
The simplest implementation: embed query, retrieve top-$k$ chunks by cosine similarity $\frac{\mathbf{q} \cdot \mathbf{d}}{||\mathbf{q}|| \cdot ||\mathbf{d}||}$, stuff into prompt, generate. Prone to retrieval noise, irrelevant chunks, and context overflow on complex queries.
Optimizes each stage of the pipeline:5)
Breaks the pipeline into interchangeable components for maximum flexibility:
github.io/graphrag/|Microsoft's GraphRAG]]6).io/graphrag/|Microsoft GraphRAG]])) builds a knowledge graph from documents, extracting entities and relationships, then uses graph traversal combined with vector search.7) This captures hierarchical context and entity connections that flat vector search misses, excelling on complex analytical queries.
| Strategy | Method | Best For |
| Fixed-size | Split by token/character count with overlap | Simple documents, fast implementation |
| Recursive | Hierarchical split (paragraphs, sentences, words) | Structured text with natural boundaries |
| Semantic | Group by embedding similarity | Topic-coherent chunks, mixed documents |
| Contextual | Prepend document-level context to each chunk | Preserving source context in retrieval |
| Agentic | LLM decides chunk boundaries | Complex documents requiring judgment |
from langchain_openai import OpenAIEmbeddings, ChatOpenAI from langchain_community.vectorstores import Chroma from [[langchain|langchain]].text_splitter import RecursiveCharacterTextSplitter from langchain_cohere import CohereRerank from [[langchain|langchain]].retrievers import ContextualCompressionRetriever 1. Chunk documents with recursive splitting splitter = RecursiveCharacterTextSplitter( chunk_size=512, chunk_overlap=64, separators=["\n\n", "\n", ". ", " "] ) chunks = splitter.split_documents(documents) 2. Embed and store in vector database [[embeddings|embeddings]] = OpenAIEmbeddings(model="text-embedding-3-large") vectorstore = Chroma.from_documents(chunks, [[embeddings|embeddings]]) 3. Hybrid retrieval with [[reranking|reranking]] base_retriever = vectorstore.as_retriever(search_kwargs={"k": 20}) reranker = CohereRerank(top_n=5) retriever = ContextualCompressionRetriever( base_compressor=reranker, base_retriever=base_retriever ) 4. Generate with retrieved context llm = ChatOpenAI(model="gpt-4") docs = retriever.invoke("How does GraphRAG improve retrieval?") context = "\n".join(doc.page_content for doc in docs) response = llm.invoke(f"Context: {context}\n\nQuestion: How does GraphRAG improve retrieval?")
RAGAS8) (Retrieval Augmented Generation Assessment Suite) provides standard metrics for evaluating RAG pipelines:9)