Core Concepts
Reasoning
Memory & Retrieval
Agent Types
Design Patterns
Training & Alignment
Frameworks
Tools
Safety & Security
Evaluation
Meta
Core Concepts
Reasoning
Memory & Retrieval
Agent Types
Design Patterns
Training & Alignment
Frameworks
Tools
Safety & Security
Evaluation
Meta
This is an old revision of the document!
Latent reasoning refers to techniques that enable LLMs to reason in continuous latent space using hidden states as “continuous thoughts,” rather than generating explicit chain-of-thought (CoT) tokens. This approach allows models to encode multiple reasoning paths simultaneously for more efficient computation.
Explicit chain-of-thought reasoning has fundamental constraints:
Latent reasoning addresses these by performing computation in the model's continuous hidden state space, where vectors can represent superpositions of multiple reasoning paths.
Coconut (Hao et al., arXiv:2412.06769) is the foundational work on latent reasoning. The model switches between language mode and latent mode using special tokens:
<bot> (beginning of thought): Enter latent reasoning mode<eot> (end of thought): Return to language generationIn latent mode, the model reuses the last hidden state as input for the next step without decoding to tokens. This is fully differentiable, enabling training via backpropagation.
# Conceptual illustration of Coconut's latent reasoning class CoconutModel: def forward(self, input_ids, mode="language"): hidden = self.embed(input_ids) for layer in self.layers: hidden = layer(hidden) if mode == "latent": # Reuse hidden state as next input (no token decoding) return hidden # continuous thought vector else: # Standard token prediction return self.lm_head(hidden) def reason(self, prompt, n_latent_steps=3): hidden = self.forward(prompt) # Perform n steps of latent reasoning for _ in range(n_latent_steps): hidden = self.forward_latent(hidden) # iterate in latent space # Decode final answer from enriched hidden state return self.decode(hidden)
Training curriculum: Start with full CoT examples, then progressively replace k reasoning sentences with k x c latent thoughts (c=1-2 per step). This trains the model to optimize vector-based reasoning indirectly.
Geiping et al. introduce a recurrent depth architecture for latent reasoning that iterates a recurrent block to arbitrary depth at test-time:
This creates a natural mechanism for test-time compute scaling: allocate more recurrent iterations for harder problems, fewer for easy ones.
| Aspect | Explicit CoT | Latent Reasoning |
|---|---|---|
| Representation | Discrete tokens | Continuous hidden states |
| Search style | Single path, autoregressive | Multi-path (BFS/tree-like) |
| Efficiency | High token cost, early commitment | Fewer tokens, parallel exploration |
| Interpretability | Human-readable | Opaque (requires probing) |
| Strong tasks | Math (GSM8K) | Logic, graph reachability |
Analysis of latent thoughts (arXiv:2505.12514) reveals that continuous thought vectors can encode superpositions of multiple reasoning states simultaneously. For example, in graph reachability tasks, a single latent vector can represent all nodes reachable at the current search frontier – effectively implementing BFS in vector space.
This is impossible with discrete tokens, which must commit to naming specific nodes. The superposition property explains why latent reasoning excels at search-like tasks.
The STILL (Slow Thinking with LLMs) models explore the boundary between explicit and latent reasoning:
These models demonstrate that a spectrum exists between fully explicit CoT and fully latent reasoning, with hybrid approaches often achieving the best efficiency-accuracy tradeoff.
Latent reasoning offers concrete efficiency advantages:
On graph reachability tasks, Coconut outperforms CoT while using significantly fewer generation steps. On GSM8K math, CoT still edges ahead, suggesting latent reasoning is strongest for search and logic tasks.
Recent work (2025-2026) explores dual-architecture latent reasoning where a fluent base model exchanges latent messages with a specialized coprocessor: