Core Concepts
Reasoning
Memory & Retrieval
Agent Types
Design Patterns
Training & Alignment
Frameworks
Tools
Safety & Security
Evaluation
Meta
Core Concepts
Reasoning
Memory & Retrieval
Agent Types
Design Patterns
Training & Alignment
Frameworks
Tools
Safety & Security
Evaluation
Meta
Chunking in RAG systems divides documents into smaller, retrievable units to optimize embedding, storage, and retrieval. Effective strategies balance context preservation with retrieval precision, with common chunk sizes ranging from 256 to 2,048 tokens depending on the embedding model and use case. 1)
Fixed-size chunking splits text by predefined limits, either token-based (e.g., 512 or 1,024 tokens, aligning with model context windows) or character-based (raw character counts). 2)
Adds 10-20% overlap between adjacent chunks (e.g., the last tokens of one chunk repeat at the start of the next) to preserve context that spans boundaries. 3)
Groups sentences by embedding similarity, where chunks form around semantically coherent content. Uses embedding models to compute vector distances, enabling meaning-preserving splits. 4)
Advanced variants include:
LangChain's recursive splitter applies hierarchical separators in priority order: 5)
The splitter recurses on oversized chunks until the target size is reached, keeping natural text units intact as long as possible before falling back to smaller separators.
Splits based on document structure: 6)
Splits at sentence boundaries (e.g., via period detection or NLP sentence segmentation), grouping sentences into chunks up to a target size. Combines well with semantic methods for conversational AI applications. 7)
An AI agent analyzes document type, density, and structure to dynamically select or mix strategies (e.g., header-based for Markdown, proposition-based for dense text), adding metadata tags. This tailors the chunking approach to content for precise, context-aware retrieval. 8)
| Use Case | Typical Range (Tokens) | Rationale |
|---|---|---|
| Fine-grained details (Q&A) | 256-512 | Captures specifics without noise |
| Broad themes (summaries) | 1,024-2,048 | Retains context; fits embedding windows |
| Code and legal documents | 512-1,024 + overlap | Structure-aware with metadata |
| Conversational AI | 256-1,024 semantic | Preserves dialogue flow |
Late chunking embeds full documents first, then splits retrieved passages dynamically to preserve global context and avoid early fragmentation. This contrasts with standard pre-chunking where documents are split before indexing. 10)
Tags chunks with headings, timestamps, document types, authors, and section titles for enriched retrieval. Metadata enables filtering (e.g., by section or date) and improves relevance when combined with overlap or semantic splitting. 11)
Anthropic's contextual retrieval uses an LLM to generate a brief context summary for each chunk before indexing, preserving the chunk's relationship to its source document. This mitigates the fragmentation problem where individual chunks lose meaning without their surrounding context. 12)