====== Historical Context for LLM Context Windows ====== Historical context refers to **prior conversation turns** retained in the [[llm_context_window|context window]], giving the model the appearance of memory. Every earlier user message and assistant response that remains in the window constitutes historical context, enabling coherent multi-turn dialogue. ((Source: [[https://redis.io/blog/llm-context-windows/|Redis - LLM Context Windows]])) ===== How Conversation History Works ===== LLMs are **stateless** — they have no persistent memory between API calls. Each request must include the full conversation transcript that the model should be aware of. The client application is responsible for storing and replaying this history with every new turn. ((Source: [[https://mem0.ai/blog/llm-chat-history-summarization-guide-2025|Mem0 - Chat History Summarization]])) On each turn: - The application appends the new user message to the stored history - The entire history — [[instructional_context|system prompt]], all prior turns, and the new message — is sent as one payload - The model processes the full transcript and generates a response - The response is appended to the history for the next turn This means conversation history grows linearly with each exchange, consuming an increasing share of the finite context window. ===== Management Strategies ===== As conversations grow, history management becomes essential to prevent window overflow: * **Truncation**: Drop the oldest turns to make room for new ones. Simple but risks losing important early context such as initial instructions or key decisions. ((Source: [[https://redis.io/blog/llm-context-windows/|Redis - LLM Context Windows]])) * **Summarization**: Condense older turns into a compact summary, preserving key facts while freeing token budget. This is the most common production approach. ((Source: [[https://mem0.ai/blog/llm-chat-history-summarization-guide-2025|Mem0 - Chat History Summarization]])) * **Sliding window**: Retain only the most recent N turns, allowing older context to "slide out" of view. Effective for chat applications but can cause persona drift or loss of earlier agreements. * **Hybrid memory**: Combine short-term history (recent turns in the window) with long-term memory (vector-stored embeddings of older turns retrieved on demand). ((Source: [[https://redis.io/blog/llm-context-windows/|Redis - LLM Context Windows]])) ===== The Sliding Window Approach ===== The sliding window is the simplest history management technique. As new turns are added, the window "slides" forward, evicting the oldest exchanges from the fixed-size buffer. This keeps token usage constant but introduces risks: * Early safety guidelines may scroll out of view * The model may contradict statements it made earlier * Users may reference forgotten context, confusing the model Production systems often combine sliding windows with **pinned messages** — critical early turns that are never evicted. ((Source: [[https://apxml.com/courses/intro-llm-red-teaming/chapter-3-core-red-teaming-techniques-llms/exploiting-llm-memory-context|APXML - Exploiting LLM Memory Context]])) ===== The Lost-in-the-Middle Problem ===== Even when history fits within the window, LLMs show measurably **weaker recall** for information positioned in the middle of the context. Turns at the beginning and end receive stronger attention, while mid-conversation details are more likely to be overlooked or misremembered. ((Source: [[https://redis.io/blog/llm-context-windows/|Redis - LLM Context Windows]])) This effect is particularly problematic for historical context because conversation history naturally fills the middle of the window, sandwiched between the system prompt at the start and the current query at the end. ===== External Memory Systems ===== For applications requiring memory beyond the context window, external systems extend historical context: * **Vector databases** (e.g., Pinecone, Milvus) store conversation embeddings for semantic retrieval * **Memory frameworks** (e.g., Mem0, Redis) provide structured long-term storage with retrieval APIs * **Multi-level architectures** separate short-term (current session), episodic (key past events), and semantic (general knowledge) memory layers ((Source: [[https://redis.io/blog/llm-context-windows/|Redis - LLM Context Windows]])) These systems retrieve relevant historical snippets on demand, injecting them as [[background_context|background context]] rather than maintaining a full transcript. ===== See Also ===== * [[llm_context_window|What Is an LLM Context Window]] * [[operational_context|Operational Context]] * [[background_context|Background Context]] * [[instructional_context|Instructional Context]] * [[million_token_context_window|Value of 1-Million-Token Context Windows]] ===== References =====