Table of Contents

How to Choose Chunk Size for RAG

Chunk size is the most underestimated hyperparameter in Retrieval-Augmented Generation. It silently determines what your LLM sees, how much retrieval costs, and how accurate answers are. This guide synthesizes published benchmark data and practical strategies for choosing optimal chunk sizes.1)

Why Chunk Size Matters

In NVIDIA's 2024 benchmark across 7 strategies and 5 datasets, wrong chunking strategy reduced recall by up to 9%.2)3)

Decision Tree

graph TD A[Start: What content type?] --> B{Structured with\nclear sections?} B -->|Yes| C{Pages or\nchapters?} B -->|No| D{Content type?} C -->|Pages| E[Page-Level Chunking] C -->|Sections| F[Recursive Splitting\n512 tokens] D -->|Code| G[AST or Function-Level\n+ Recursive fallback] D -->|Legal/Academic| E D -->|Technical Docs| F D -->|Conversational| H[Semantic Chunking] D -->|Mixed/Unknown| F F --> I{Need higher\naccuracy?} I -->|Yes| J[Multi-Scale Indexing\n100+200+500 tokens] I -->|No| K[Single scale\n512 tokens + 10-20pct overlap] style E fill:#4CAF50,color:#fff style F fill:#2196F3,color:#fff style G fill:#FF9800,color:#fff style H fill:#9C27B0,color:#fff style J fill:#E91E63,color:#fff style K fill:#2196F3,color:#fff

Chunking Strategies Compared

Strategy How It Works Benchmark Accuracy Avg Chunk Size Speed Best For
Fixed-Size Split by token/character count 13% (clinical baseline) Exact target Fastest Prototyping only
Recursive Character Split by sections then paragraphs then sentences 69% (Vecta 2026) 400-512 tokens Fast General text (default)
Page-Level Keep full pages intact 0.648 (NVIDIA 2024) Page-native Fast PDFs, legal, academic
Semantic Group by embedding similarity 54% (Vecta) / 87% (adaptive) Variable (43 tokens avg) Slow Topic-heavy content
Sentence-Level Split at sentence boundaries Moderate ~66 chars avg Fast Conversational data
Parent-Child Small chunks for retrieval, large parents for context +5-10% over flat Dual (128/512) Medium Complex documents
Multi-Scale Index at multiple sizes, fuse results +7-13% over single Multiple Slowest Maximum accuracy

Sources: Vecta 2026 (50 papers), NVIDIA 2024 benchmark, MDPI 2025 clinical study, AI21 Labs 20264)

Optimal Sizes by Content Type

Content Type Recommended Size Overlap Strategy Notes
Technical documentation 512 tokens 50-100 tokens (10-20%) Recursive Best general default
Code Function-level (variable) 0 (natural boundaries) AST-aware / Recursive Respect function and class boundaries
Legal documents Page-level or 1024 tokens 100-200 tokens Page-level / Recursive Preserve clause structure
Academic papers 512-1024 tokens 100 tokens Recursive with section headers Include metadata
Conversational logs 256-512 tokens 50 tokens Semantic Group by topic shifts
Product descriptions 256 tokens 25 tokens Fixed or Recursive Short, self-contained
FAQ/Q&A pairs Per-question (natural) 0 Document-level Each Q&A is one chunk

Overlap Strategies

Overlap prevents information loss at chunk boundaries. The industry standard is 10-20% overlap.

Overlap Percent Token Example (500 chunk) Benefit Cost
0% 0 tokens Minimum storage Breaks cross-boundary info
10% 50 tokens Good balance +10% storage
20% 100 tokens Better boundary coverage +20% storage
50% 250 tokens Sliding window +50% storage, diminishing returns

Rule of thumb: Use 10% overlap for clean structured text, 20% for dense unstructured text, 0% for naturally bounded content (Q&A pairs, code functions).

Chunk Size vs Query Type

Research from NVIDIA (2024) and AI21 Labs (2026) shows optimal chunk size depends on query type:

Query Type Optimal Size Rationale
Factoid (“What is X?”) 256-512 tokens Precise, focused retrieval
Analytical (“Compare X and Y”) 1024+ tokens Needs broader context
Summarization (“Summarize this doc”) Page-level / 2048 tokens Needs comprehensive view
Code search (“How to implement X”) Function-level Natural semantic boundary

AI21 Labs demonstrated that multi-scale indexing (100, 200, 500 tokens with Reciprocal Rank Fusion) outperforms any single chunk size because different queries need different granularity.5)

Implementation Example

from langchain.text_splitter import RecursiveCharacterTextSplitter
 
# Default: Recursive splitting at 512 tokens with 10% overlap
def create_splitter(content_type="general"):
    configs = {
        "general": {"chunk_size": 512, "chunk_overlap": 50},
        "code": {"chunk_size": 1000, "chunk_overlap": 0,
                 "separators": ["\nclass ", "\ndef ", "\n\n", "\n"]},
        "legal": {"chunk_size": 1024, "chunk_overlap": 200},
        "conversational": {"chunk_size": 256, "chunk_overlap": 50},
        "faq": {"chunk_size": 300, "chunk_overlap": 0,
                "separators": ["\n\n"]},
    }
    config = configs.get(content_type, configs["general"])
    return RecursiveCharacterTextSplitter(
        chunk_size=config["chunk_size"],
        chunk_overlap=config["chunk_overlap"],
        separators=config.get("separators", ["\n\n", "\n", ". ", " "]),
        length_function=len,
    )
 
# Multi-scale indexing for maximum accuracy
def multi_scale_index(documents, sizes=[100, 256, 512]):
    from collections import defaultdict
    all_chunks = defaultdict(list)
    for size in sizes:
        splitter = RecursiveCharacterTextSplitter(
            chunk_size=size, chunk_overlap=int(size * 0.1)
        )
        for doc in documents:
            chunks = splitter.split_text(doc)
            all_chunks[size].extend(chunks)
    return all_chunks  # Index each scale separately, fuse at query time
 
# Parent-child chunking
def parent_child_chunks(documents, parent_size=1024, child_size=256):
    parent_splitter = RecursiveCharacterTextSplitter(chunk_size=parent_size)
    child_splitter = RecursiveCharacterTextSplitter(chunk_size=child_size)
    index = []
    for doc in documents:
        parents = parent_splitter.split_text(doc)
        for parent in parents:
            children = child_splitter.split_text(parent)
            for child in children:
                index.append({"child": child, "parent": parent})
    return index  # Search on child, retrieve parent

Evaluation Methodology

Always benchmark chunk sizes on your own data. Key metrics:

# Simple chunk size evaluation
def evaluate_chunk_sizes(documents, test_queries, ground_truth, sizes=[256, 512, 1024]):
    results = {}
    for size in sizes:
        splitter = RecursiveCharacterTextSplitter(
            chunk_size=size, chunk_overlap=int(size * 0.1)
        )
        chunks = splitter.split_documents(documents)
        vectorstore = build_vectorstore(chunks)
 
        correct = 0
        for query, expected in zip(test_queries, ground_truth):
            retrieved = vectorstore.similarity_search(query, k=5)
            if any(expected in r.page_content for r in retrieved):
                correct += 1
        results[size] = correct / len(test_queries)
    return results  # e.g. {256: 0.72, 512: 0.81, 1024: 0.76}

Key Takeaways

  1. Start with 512 tokens, recursive splitting, 10% overlap. This is the best default backed by benchmarks.
  2. Content type matters more than a magic number. Code, legal, and conversational content each need different strategies.
  3. Multi-scale indexing gives the best accuracy (+7-13%) but at higher storage and complexity cost.
  4. Parent-child chunking is the best balance of precision and context for complex documents.
  5. Always benchmark on your data. Published numbers are starting points, not guarantees.
  6. Query type affects optimal size: factoid queries want small chunks, analytical queries want large ones.

See Also

References