Core Concepts
Reasoning
Memory & Retrieval
Agent Types
Design Patterns
Training & Alignment
Frameworks
Tools
Safety & Security
Evaluation
Meta
Core Concepts
Reasoning
Memory & Retrieval
Agent Types
Design Patterns
Training & Alignment
Frameworks
Tools
Safety & Security
Evaluation
Meta
This is an old revision of the document!
Chunk size is the most underestimated hyperparameter in Retrieval-Augmented Generation. It silently determines what your LLM sees, how much retrieval costs, and how accurate answers are. This guide synthesizes published benchmark data and practical strategies for choosing optimal chunk sizes.1)
In NVIDIA's 2024 benchmark across 7 strategies and 5 datasets, wrong chunking strategy reduced recall by up to 9%.2)3)
| Strategy | How It Works | Benchmark Accuracy | Avg Chunk Size | Speed | Best For |
|---|---|---|---|---|---|
| Fixed-Size | Split by token/character count | 13% (clinical baseline) | Exact target | Fastest | Prototyping only |
| Recursive Character | Split by sections then paragraphs then sentences | 69% (Vecta 2026) | 400-512 tokens | Fast | General text (default) |
| Page-Level | Keep full pages intact | 0.648 (NVIDIA 2024) | Page-native | Fast | PDFs, legal, academic |
| Semantic | Group by embedding similarity | 54% (Vecta) / 87% (adaptive) | Variable (43 tokens avg) | Slow | Topic-heavy content |
| Sentence-Level | Split at sentence boundaries | Moderate | ~66 chars avg | Fast | Conversational data |
| Parent-Child | Small chunks for retrieval, large parents for context | +5-10% over flat | Dual (128/512) | Medium | Complex documents |
| Multi-Scale | Index at multiple sizes, fuse results | +7-13% over single | Multiple | Slowest | Maximum accuracy |
Sources: Vecta 2026 (50 papers), NVIDIA 2024 benchmark, MDPI 2025 clinical study, AI21 Labs 20264)
| Content Type | Recommended Size | Overlap | Strategy | Notes |
|---|---|---|---|---|
| Technical documentation | 512 tokens | 50-100 tokens (10-20%) | Recursive | Best general default |
| Code | Function-level (variable) | 0 (natural boundaries) | AST-aware / Recursive | Respect function and class boundaries |
| Legal documents | Page-level or 1024 tokens | 100-200 tokens | Page-level / Recursive | Preserve clause structure |
| Academic papers | 512-1024 tokens | 100 tokens | Recursive with section headers | Include metadata |
| Conversational logs | 256-512 tokens | 50 tokens | Semantic | Group by topic shifts |
| Product descriptions | 256 tokens | 25 tokens | Fixed or Recursive | Short, self-contained |
| FAQ/Q&A pairs | Per-question (natural) | 0 | Document-level | Each Q&A is one chunk |
Overlap prevents information loss at chunk boundaries. The industry standard is 10-20% overlap.
| Overlap Percent | Token Example (500 chunk) | Benefit | Cost |
|---|---|---|---|
| 0% | 0 tokens | Minimum storage | Breaks cross-boundary info |
| 10% | 50 tokens | Good balance | +10% storage |
| 20% | 100 tokens | Better boundary coverage | +20% storage |
| 50% | 250 tokens | Sliding window | +50% storage, diminishing returns |
Rule of thumb: Use 10% overlap for clean structured text, 20% for dense unstructured text, 0% for naturally bounded content (Q&A pairs, code functions).
Research from NVIDIA (2024) and AI21 Labs (2026) shows optimal chunk size depends on query type:
| Query Type | Optimal Size | Rationale |
|---|---|---|
| Factoid (“What is X?”) | 256-512 tokens | Precise, focused retrieval |
| Analytical (“Compare X and Y”) | 1024+ tokens | Needs broader context |
| Summarization (“Summarize this doc”) | Page-level / 2048 tokens | Needs comprehensive view |
| Code search (“How to implement X”) | Function-level | Natural semantic boundary |
AI21 Labs demonstrated that multi-scale indexing (100, 200, 500 tokens with Reciprocal Rank Fusion) outperforms any single chunk size because different queries need different granularity.5)
from langchain.text_splitter import RecursiveCharacterTextSplitter # Default: Recursive splitting at 512 tokens with 10% overlap def create_splitter(content_type="general"): configs = { "general": {"chunk_size": 512, "chunk_overlap": 50}, "code": {"chunk_size": 1000, "chunk_overlap": 0, "separators": ["\nclass ", "\ndef ", "\n\n", "\n"]}, "legal": {"chunk_size": 1024, "chunk_overlap": 200}, "conversational": {"chunk_size": 256, "chunk_overlap": 50}, "faq": {"chunk_size": 300, "chunk_overlap": 0, "separators": ["\n\n"]}, } config = configs.get(content_type, configs["general"]) return RecursiveCharacterTextSplitter( chunk_size=config["chunk_size"], chunk_overlap=config["chunk_overlap"], separators=config.get("separators", ["\n\n", "\n", ". ", " "]), length_function=len, ) # Multi-scale indexing for maximum accuracy def multi_scale_index(documents, sizes=[100, 256, 512]): from collections import defaultdict all_chunks = defaultdict(list) for size in sizes: splitter = RecursiveCharacterTextSplitter( chunk_size=size, chunk_overlap=int(size * 0.1) ) for doc in documents: chunks = splitter.split_text(doc) all_chunks[size].extend(chunks) return all_chunks # Index each scale separately, fuse at query time # Parent-child chunking def parent_child_chunks(documents, parent_size=1024, child_size=256): parent_splitter = RecursiveCharacterTextSplitter(chunk_size=parent_size) child_splitter = RecursiveCharacterTextSplitter(chunk_size=child_size) index = [] for doc in documents: parents = parent_splitter.split_text(doc) for parent in parents: children = child_splitter.split_text(parent) for child in children: index.append({"child": child, "parent": parent}) return index # Search on child, retrieve parent
Always benchmark chunk sizes on your own data. Key metrics:
# Simple chunk size evaluation def evaluate_chunk_sizes(documents, test_queries, ground_truth, sizes=[256, 512, 1024]): results = {} for size in sizes: splitter = RecursiveCharacterTextSplitter( chunk_size=size, chunk_overlap=int(size * 0.1) ) chunks = splitter.split_documents(documents) vectorstore = build_vectorstore(chunks) correct = 0 for query, expected in zip(test_queries, ground_truth): retrieved = vectorstore.similarity_search(query, k=5) if any(expected in r.page_content for r in retrieved): correct += 1 results[size] = correct / len(test_queries) return results # e.g. {256: 0.72, 512: 0.81, 1024: 0.76}