Fixed-Size Chunking
Overlapping Chunks
Semantic Chunking
Recursive Character Text Splitting
Document-Aware Chunking
Sentence-Based Chunking
Agentic Chunking
Optimal Chunk Sizes
Impact on Retrieval Quality
Late Chunking
Chunk Metadata and Contextualization
Contextual Retrieval Approach
Practical Recommendations
See Also
References

Chunking Strategies

Chunking in RAG systems divides documents into smaller, retrievable units to optimize embedding, storage, and retrieval. Effective strategies balance context preservation with retrieval precision, with common chunk sizes ranging from 256 to 2,048 tokens depending on the embedding model and use case. ¹⁾

Fixed-Size Chunking

Fixed-size chunking splits text by predefined limits, either token-based (e.g., 512 or 1,024 tokens, aligning with model context windows) or character-based (raw character counts). ²⁾

Token-based is preferred for LLM compatibility, as it accounts for subword tokenization
Simple and fast but risks splitting mid-sentence, losing context
Often paired with overlap to mitigate boundary issues

Overlapping Chunks

Adds 10-20% overlap between adjacent chunks (e.g., the last tokens of one chunk repeat at the start of the next) to preserve context that spans boundaries. ³⁾

Improves retrieval continuity without excessive redundancy
Common in production for legal documents and dense technical content

Semantic Chunking

Groups sentences by embedding similarity, where chunks form around semantically coherent content. Uses embedding models to compute vector distances, enabling meaning-preserving splits. ⁴⁾

Advanced variants include:

Proposition-based: Extracts logical statements and groups related propositions
Summarization-based: Summarizes sections and clusters by summary similarity
Ideal for nuanced text like conversations and mixed-topic documents

Recursive Character Text Splitting

LangChain's recursive splitter applies hierarchical separators in priority order: ⁵⁾

Paragraphs (double newlines)
Lines (single newlines)
Sentences (period + space)
Words (spaces)

The splitter recurses on oversized chunks until the target size is reached, keeping natural text units intact as long as possible before falling back to smaller separators.

Document-Aware Chunking

Splits based on document structure: ⁶⁾

Markdown: Splits on headers, lists, and code blocks
HTML: Splits on tags and structural elements
Code: Splits on classes, functions, and logical blocks
Page-level: Often outperforms section-level or token-based splitting in benchmarks, yielding highest average RAG accuracy across datasets

Sentence-Based Chunking

Splits at sentence boundaries (e.g., via period detection or NLP sentence segmentation), grouping sentences into chunks up to a target size. Combines well with semantic methods for conversational AI applications. ⁷⁾

Agentic Chunking

An AI agent analyzes document type, density, and structure to dynamically select or mix strategies (e.g., header-based for Markdown, proposition-based for dense text), adding metadata tags. This tailors the chunking approach to content for precise, context-aware retrieval. ⁸⁾

Optimal Chunk Sizes

Use Case	Typical Range (Tokens)	Rationale
Fine-grained details (Q&A)	256-512	Captures specifics without noise
Broad themes (summaries)	1,024-2,048	Retains context; fits embedding windows
Code and legal documents	512-1,024 + overlap	Structure-aware with metadata
Conversational AI	256-1,024 semantic	Preserves dialogue flow

Impact on Retrieval Quality

Smaller chunks (below 512 tokens): Higher precision for specific details but poorer global context, increasing risk of irrelevant retrievals
Larger chunks (above 1,024 tokens): Better coherence but noisier results and greater token consumption
Benchmarks show page-level chunking as most consistent for accuracy versus token-based or section-based methods ⁹⁾

Late Chunking

Late chunking embeds full documents first, then splits retrieved passages dynamically to preserve global context and avoid early fragmentation. This contrasts with standard pre-chunking where documents are split before indexing. ¹⁰⁾

Reduces incomplete retrieval
Trades query-time processing speed for context preservation

Chunk Metadata and Contextualization

Tags chunks with headings, timestamps, document types, authors, and section titles for enriched retrieval. Metadata enables filtering (e.g., by section or date) and improves relevance when combined with overlap or semantic splitting. ¹¹⁾

Contextual Retrieval Approach

Anthropic's contextual retrieval uses an LLM to generate a brief context summary for each chunk before indexing, preserving the chunk's relationship to its source document. This mitigates the fragmentation problem where individual chunks lose meaning without their surrounding context. ¹²⁾

Practical Recommendations

Start with 512-1,024 tokens plus 15% overlap as a baseline
Measure end-to-end RAG accuracy, not just retrieval metrics
Use recursive splitting for general text, document-aware splitting for structured content
Add chunk metadata for filtering and contextual enrichment
Test multiple strategies on your specific corpus via A/B testing ¹³⁾