Table of Contents

Chunking Strategies

Chunking in RAG systems divides documents into smaller, retrievable units to optimize embedding, storage, and retrieval. Effective strategies balance context preservation with retrieval precision, with common chunk sizes ranging from 256 to 2,048 tokens depending on the embedding model and use case. 1)

Fixed-Size Chunking

Fixed-size chunking splits text by predefined limits, either token-based (e.g., 512 or 1,024 tokens, aligning with model context windows) or character-based (raw character counts). 2)

Overlapping Chunks

Adds 10-20% overlap between adjacent chunks (e.g., the last tokens of one chunk repeat at the start of the next) to preserve context that spans boundaries. 3)

Semantic Chunking

Groups sentences by embedding similarity, where chunks form around semantically coherent content. Uses embedding models to compute vector distances, enabling meaning-preserving splits. 4)

Advanced variants include:

Recursive Character Text Splitting

LangChain's recursive splitter applies hierarchical separators in priority order: 5)

  1. Paragraphs (double newlines)
  2. Lines (single newlines)
  3. Sentences (period + space)
  4. Words (spaces)

The splitter recurses on oversized chunks until the target size is reached, keeping natural text units intact as long as possible before falling back to smaller separators.

Document-Aware Chunking

Splits based on document structure: 6)

Sentence-Based Chunking

Splits at sentence boundaries (e.g., via period detection or NLP sentence segmentation), grouping sentences into chunks up to a target size. Combines well with semantic methods for conversational AI applications. 7)

Agentic Chunking

An AI agent analyzes document type, density, and structure to dynamically select or mix strategies (e.g., header-based for Markdown, proposition-based for dense text), adding metadata tags. This tailors the chunking approach to content for precise, context-aware retrieval. 8)

Optimal Chunk Sizes

Use Case Typical Range (Tokens) Rationale
Fine-grained details (Q&A) 256-512 Captures specifics without noise
Broad themes (summaries) 1,024-2,048 Retains context; fits embedding windows
Code and legal documents 512-1,024 + overlap Structure-aware with metadata
Conversational AI 256-1,024 semantic Preserves dialogue flow

Impact on Retrieval Quality

Late Chunking

Late chunking embeds full documents first, then splits retrieved passages dynamically to preserve global context and avoid early fragmentation. This contrasts with standard pre-chunking where documents are split before indexing. 10)

Chunk Metadata and Contextualization

Tags chunks with headings, timestamps, document types, authors, and section titles for enriched retrieval. Metadata enables filtering (e.g., by section or date) and improves relevance when combined with overlap or semantic splitting. 11)

Contextual Retrieval Approach

Anthropic's contextual retrieval uses an LLM to generate a brief context summary for each chunk before indexing, preserving the chunk's relationship to its source document. This mitigates the fragmentation problem where individual chunks lose meaning without their surrounding context. 12)

Practical Recommendations

See Also

References

1)
https://weaviate.io/blog/chunking-strategies-for-rag|Weaviate: Chunking Strategies for RAG
12)
https://arxiv.org/abs/2504.19754|arXiv: Contextual Retrieval