Why Chunk Size Matters
Decision Tree
Chunking Strategies Compared
Optimal Sizes by Content Type
Overlap Strategies
Chunk Size vs Query Type
Implementation Example
Evaluation Methodology
Key Takeaways
See Also
References

How to Choose Chunk Size for RAG

Chunk size is the most underestimated hyperparameter in Retrieval-Augmented Generation. It silently determines what your LLM sees, how much retrieval costs, and how accurate answers are. This guide synthesizes published benchmark data and practical strategies for choosing optimal chunk sizes.¹⁾

Why Chunk Size Matters

Too small — Loses context, fragments meaning, increases retrieval noise
Too large — Dilutes relevance, wastes context window, reduces precision
Just right — Balances semantic completeness with retrieval specificity

In NVIDIA's 2024 benchmark across 7 strategies and 5 datasets, wrong chunking strategy reduced recall by up to 9%.²⁾³⁾

Decision Tree

graph TD A[Start: What content type?] --> B{Structured with\nclear sections?} B -->|Yes| C{Pages or\nchapters?} B -->|No| D{Content type?} C -->|Pages| E[Page-Level Chunking] C -->|Sections| F[Recursive Splitting\n512 tokens] D -->|Code| G[AST or Function-Level\n+ Recursive fallback] D -->|Legal/Academic| E D -->|Technical Docs| F D -->|Conversational| H[Semantic Chunking] D -->|Mixed/Unknown| F F --> I{Need higher\naccuracy?} I -->|Yes| J[Multi-Scale Indexing\n100+200+500 tokens] I -->|No| K[Single scale\n512 tokens + 10-20pct overlap] style E fill:#4CAF50,color:#fff style F fill:#2196F3,color:#fff style G fill:#FF9800,color:#fff style H fill:#9C27B0,color:#fff style J fill:#E91E63,color:#fff style K fill:#2196F3,color:#fff

Chunking Strategies Compared

Strategy	How It Works	Benchmark Accuracy	Avg Chunk Size	Speed	Best For
Fixed-Size	Split by token/character count	13% (clinical baseline)	Exact target	Fastest	Prototyping only
Recursive Character	Split by sections then paragraphs then sentences	69% (Vecta 2026)	400-512 tokens	Fast	General text (default)
Page-Level	Keep full pages intact	0.648 (NVIDIA 2024)	Page-native	Fast	PDFs, legal, academic
Semantic	Group by embedding similarity	54% (Vecta) / 87% (adaptive)	Variable (43 tokens avg)	Slow	Topic-heavy content
Sentence-Level	Split at sentence boundaries	Moderate	~66 chars avg	Fast	Conversational data
Parent-Child	Small chunks for retrieval, large parents for context	+5-10% over flat	Dual (128/512)	Medium	Complex documents
Multi-Scale	Index at multiple sizes, fuse results	+7-13% over single	Multiple	Slowest	Maximum accuracy

Sources: Vecta 2026 (50 papers), NVIDIA 2024 benchmark, MDPI 2025 clinical study, AI21 Labs 2026⁴⁾

Optimal Sizes by Content Type

Content Type	Recommended Size	Overlap	Strategy	Notes
Technical documentation	512 tokens	50-100 tokens (10-20%)	Recursive	Best general default
Code	Function-level (variable)	0 (natural boundaries)	AST-aware / Recursive	Respect function and class boundaries
Legal documents	Page-level or 1024 tokens	100-200 tokens	Page-level / Recursive	Preserve clause structure
Academic papers	512-1024 tokens	100 tokens	Recursive with section headers	Include metadata
Conversational logs	256-512 tokens	50 tokens	Semantic	Group by topic shifts
Product descriptions	256 tokens	25 tokens	Fixed or Recursive	Short, self-contained
FAQ/Q&A pairs	Per-question (natural)	0	Document-level	Each Q&A is one chunk

Overlap Strategies

Overlap prevents information loss at chunk boundaries. The industry standard is 10-20% overlap.

Overlap Percent	Token Example (500 chunk)	Benefit	Cost
0%	0 tokens	Minimum storage	Breaks cross-boundary info
10%	50 tokens	Good balance	+10% storage
20%	100 tokens	Better boundary coverage	+20% storage
50%	250 tokens	Sliding window	+50% storage, diminishing returns

Rule of thumb: Use 10% overlap for clean structured text, 20% for dense unstructured text, 0% for naturally bounded content (Q&A pairs, code functions).

Chunk Size vs Query Type

Research from NVIDIA (2024) and AI21 Labs (2026) shows optimal chunk size depends on query type:

Query Type	Optimal Size	Rationale
Factoid (“What is X?”)	256-512 tokens	Precise, focused retrieval
Analytical (“Compare X and Y”)	1024+ tokens	Needs broader context
Summarization (“Summarize this doc”)	Page-level / 2048 tokens	Needs comprehensive view
Code search (“How to implement X”)	Function-level	Natural semantic boundary

AI21 Labs demonstrated that multi-scale indexing (100, 200, 500 tokens with Reciprocal Rank Fusion) outperforms any single chunk size because different queries need different granularity.⁵⁾

Implementation Example

from langchain.text_splitter import RecursiveCharacterTextSplitter
 
# Default: Recursive splitting at 512 tokens with 10% overlap
def create_splitter(content_type="general"):
    configs = {
        "general": {"chunk_size": 512, "chunk_overlap": 50},
        "code": {"chunk_size": 1000, "chunk_overlap": 0,
                 "separators": ["\nclass ", "\ndef ", "\n\n", "\n"]},
        "legal": {"chunk_size": 1024, "chunk_overlap": 200},
        "conversational": {"chunk_size": 256, "chunk_overlap": 50},
        "faq": {"chunk_size": 300, "chunk_overlap": 0,
                "separators": ["\n\n"]},
    }
    config = configs.get(content_type, configs["general"])
    return RecursiveCharacterTextSplitter(
        chunk_size=config["chunk_size"],
        chunk_overlap=config["chunk_overlap"],
        separators=config.get("separators", ["\n\n", "\n", ". ", " "]),
        length_function=len,
    )
 
# Multi-scale indexing for maximum accuracy
def multi_scale_index(documents, sizes=[100, 256, 512]):
    from collections import defaultdict
    all_chunks = defaultdict(list)
    for size in sizes:
        splitter = RecursiveCharacterTextSplitter(
            chunk_size=size, chunk_overlap=int(size * 0.1)
        )
        for doc in documents:
            chunks = splitter.split_text(doc)
            all_chunks[size].extend(chunks)
    return all_chunks  # Index each scale separately, fuse at query time
 
# Parent-child chunking
def parent_child_chunks(documents, parent_size=1024, child_size=256):
    parent_splitter = RecursiveCharacterTextSplitter(chunk_size=parent_size)
    child_splitter = RecursiveCharacterTextSplitter(chunk_size=child_size)
    index = []
    for doc in documents:
        parents = parent_splitter.split_text(doc)
        for parent in parents:
            children = child_splitter.split_text(parent)
            for child in children:
                index.append({"child": child, "parent": parent})
    return index  # Search on child, retrieve parent

Evaluation Methodology

Always benchmark chunk sizes on your own data. Key metrics:

Retrieval Recall@k — Does the correct chunk appear in top k results?
Answer Accuracy — Does the final LLM answer match ground truth?
Latency — Time from query to retrieved chunks
Cost — Embedding tokens + storage + retrieval compute

# Simple chunk size evaluation
def evaluate_chunk_sizes(documents, test_queries, ground_truth, sizes=[256, 512, 1024]):
    results = {}
    for size in sizes:
        splitter = RecursiveCharacterTextSplitter(
            chunk_size=size, chunk_overlap=int(size * 0.1)
        )
        chunks = splitter.split_documents(documents)
        vectorstore = build_vectorstore(chunks)
 
        correct = 0
        for query, expected in zip(test_queries, ground_truth):
            retrieved = vectorstore.similarity_search(query, k=5)
            if any(expected in r.page_content for r in retrieved):
                correct += 1
        results[size] = correct / len(test_queries)
    return results  # e.g. {256: 0.72, 512: 0.81, 1024: 0.76}

Key Takeaways

Start with 512 tokens, recursive splitting, 10% overlap. This is the best default backed by benchmarks.
Content type matters more than a magic number. Code, legal, and conversational content each need different strategies.
Multi-scale indexing gives the best accuracy (+7-13%) but at higher storage and complexity cost.
Parent-child chunking is the best balance of precision and context for complex documents.
Always benchmark on your data. Published numbers are starting points, not guarantees.
Query type affects optimal size: factoid queries want small chunks, analytical queries want large ones.