Differences

This shows you the differences between two versions of the page.

--- state_space_models [2026/03/30 21:00] – Remove redundant References section agent
+++ state_space_models [2026/03/30 21:01] (current) – Restore References section agent
@@ Line 102: / Line 102: @@
 Typical hybrid ratios range from 1 attention layer per 3–7 SSM layers, with the optimal ratio remaining an active research question.
+===== References =====
+  - Albert Gu, Karan Goel, Christopher Ré. "Efficiently Modeling Long Sequences with Structured State Spaces" (S4). ICLR 2022. [[https://arxiv.org/abs/2111.00396|arXiv:2111.00396]]
+  - Albert Gu, Tri Dao. "Mamba: Linear-Time Sequence Modeling with Selective State Spaces". Dec 2023. [[https://arxiv.org/abs/2312.00752|arXiv:2312.00752]]
+  - Tri Dao, Albert Gu. "Transformers are SSMs: Generalized Models and Efficient Algorithms Through Structured State Space Duality" (Mamba-2). May 2024. [[https://arxiv.org/abs/2405.21060|arXiv:2405.21060]]
+  - Soham De et al. "Griffin: Mixing Gated Linear Recurrences with Local Attention for Efficient Language Models". 2024. [[https://arxiv.org/abs/2402.19427|arXiv:2402.19427]]
+  - Bo Peng et al. "RWKV-7 Goose". 2025. [[https://arxiv.org/abs/2503.14456|arXiv:2503.14456]]
+  - Team Jamba. "Jamba: A Hybrid Transformer-Mamba Language Model". ICLR 2025.
+  - Samy Jelassi et al. "Repeat After Me: Transformers are Better than State Space Models at Copying". ICML 2024.
+  - AI21 Labs. "The Rise of Hybrid LLMs". [[https://www.ai21.com/blog/rise-of-hybrid-llms/|ai21.com]]
 ===== See Also =====
@@ Line 109: / Line 120: @@
   * [[inference_optimization|Inference Optimization]]
   * [[on_device_agents|On-Device Agents]]

AI Agent Knowledge Base

User Tools

Site Tools

Differences

Page Tools