====== What Is an LLM Context Window ======

A **context window** is the maximum number of tokens a large language model (LLM) can process in a single request, encompassing all input and output combined. It functions as the model's working memory: everything the model can "see" and reason about at once. ((Source: [[https://redis.io/blog/llm-context-windows/|Redis - LLM Context Windows]]))

===== How It Works =====

Context windows are a direct consequence of the **transformer architecture** that underlies modern LLMs. The self-attention mechanism computes relationships between every pair of tokens, producing an O(n²) computational cost — doubling the sequence length quadruples the compute and memory required. ((Source: [[https://www.datacamp.com/blog/context-window|DataCamp - Context Window]]))

Each token passes through attention layers where queries, keys, and values are computed via dot products. The results are softmax-weighted and aggregated to produce contextual representations. A **KV cache** stores key-value pairs from prior tokens to avoid redundant computation during generation, but this cache grows linearly with sequence length, consuming substantial GPU memory. ((Source: [[https://redis.io/blog/llm-context-windows/|Redis - LLM Context Windows]]))

Tokens beyond the window limit are simply truncated — the model has no access to them.

===== Evolution of Context Window Sizes =====

Early transformer models like BERT and GPT-2 operated with **512-token** context windows, constrained by standard positional encodings and the quadratic scaling of attention. ((Source: [[https://redis.io/blog/llm-context-windows/|Redis - LLM Context Windows]]))

Architectural advances — rotary position embeddings (RoPE), grouped-query attention, flash attention, and speculative decoding — enabled rapid expansion:

| Model | Context Window |
| GPT-4 (original) | 8,192 tokens |
| GPT-4o | 128,000 tokens |
| Llama 3.1 | 128,000 tokens |
| Claude 3.5 Sonnet | 200,000 tokens |
| Claude Sonnet 4 | 1,000,000 tokens |
| Gemini 1.5 Flash | 1,000,000 tokens |
| Gemini 2.5 Pro | 2,000,000 tokens |

===== What Fills the Context Window =====

The context window is shared across several categories of content:

  * **[[instructional_context|Instructional Context]]** — System prompts, persona definitions, and behavioral rules
  * **[[background_context|Background Context]]** — Retrieved documents, knowledge base entries, and supplementary data
  * **[[operational_context|Operational Context]]** — The user's current query and immediate task inputs
  * **[[historical_context|Historical Context]]** — Prior conversation turns providing continuity
  * **Output tokens** — Space reserved for the model's generated response

All of these compete for the same fixed budget. Exceeding the limit forces truncation, typically dropping the earliest content. ((Source: [[https://nebius.com/blog/posts/context-window-in-ai|Nebius - Context Window in AI]]))

===== The Lost-in-the-Middle Problem =====

Research has shown that LLMs perform best on information placed at the **beginning** or **end** of the context window, with measurable degradation for content buried in the middle. This "lost-in-the-middle" effect means that simply having a large window does not guarantee the model will use all of its contents effectively. ((Source: [[https://redis.io/blog/llm-context-windows/|Redis - LLM Context Windows]]))

===== Why It Matters =====

The context window determines the fundamental capabilities of an LLM application. Small windows suit focused, low-cost tasks. Large windows enable processing entire codebases, legal documents, or book-length texts in a single pass — but at higher computational cost and with attention-dilution risks. Choosing the right window size is a core architectural decision for any AI system. ((Source: [[https://www.mckinsey.com/featured-insights/mckinsey-explainers/what-is-a-context-window|McKinsey - What Is a Context Window]]))

===== See Also =====

  * [[operational_context|Operational Context]]
  * [[background_context|Background Context]]
  * [[instructional_context|Instructional Context]]
  * [[historical_context|Historical Context]]
  * [[million_token_context_window|Value of 1-Million-Token Context Windows]]

===== References =====