====== Algorithm Distillation ====== Algorithm Distillation (AD) is a technique that applies knowledge distillation to reinforcement learning by training a transformer model on the complete learning histories of RL agents, enabling it to perform in-context reinforcement learning at inference time. Introduced by [[https://arxiv.org/abs/2206.11848|Laskin et al., 2022]] at DeepMind in "In-Context Reinforcement Learning with Algorithm Distillation," AD encodes the improvement process of a source RL algorithm into the weights of a causal transformer, so the model learns not just a policy but the algorithm for improving a policy.((https://arxiv.org/abs/2206.11848|Laskin et al., 2022 - In-Context Reinforcement Learning with Algorithm Distillation)) This allows the model to explore, exploit, and improve its performance over an episode purely through in-context learning, without any explicit weight updates during deployment. ===== In-Context Reinforcement Learning ===== The central idea of Algorithm [[distillation|Distillation]] is **in-context [[reinforcement_learning|reinforcement learning]] (ICRL)**: a causal transformer processes sequences of observations, actions, and rewards from RL episodes as context, and autoregressively predicts the next action conditioned on the full interaction history up to that point. Formally, the model maximizes the likelihood of the next action given the full history: $$\mathcal{L}(\theta) = \mathbb{E}_{\tau}\left[\sum_{t=1}^{T} \log p_\theta(a_t \mid \tau_{