Core Concepts
Reasoning
Memory & Retrieval
Agent Types
Design Patterns
Training & Alignment
Frameworks
Tools
Safety & Security
Evaluation
Meta
Core Concepts
Reasoning
Memory & Retrieval
Agent Types
Design Patterns
Training & Alignment
Frameworks
Tools
Safety & Security
Evaluation
Meta
Meta Llama 4 Scout is a multimodal Mixture-of-Experts (MoE) language model released by Meta on April 5, 2025, featuring a groundbreaking 10-million-token context window — the longest of any production model at launch. With 109 billion total parameters but only 17 billion active per token, Scout delivers frontier-class performance while running on a single NVIDIA H100 GPU. 1)
Scout uses a sparse MoE design optimized for efficiency:
The interleaved attention layers enable generalization to extremely long sequences far beyond the training context length. 2)
Scout was pre-trained and post-trained on 40 trillion tokens spanning both text and images. 3) The model uses early fusion for multimodal processing, integrating image understanding directly into the core architecture rather than bolting on separate vision modules.
The 10-million-token context window is Scout's most distinctive capability. Trained at 256K tokens, the model uses iRoPE position embeddings to extrapolate to 10M tokens at inference time. This enables:
Scout maintains low negative log-likelihood over 10 million code tokens, demonstrating robust long-context comprehension. 4)
Scout is deployable on a single H100 GPU using Int4 quantization (approximately 54.5 GB plus KV cache), making it one of the most accessible frontier models. Full-precision weights require 4x H100s. Inference cost is approximately $0.09 per million tokens. 6)
The model is released under the Llama 4 Community License, which is open for most uses with restrictions for organizations exceeding 700 million monthly active users. It supports fine-tuning in 12 languages.
Scout's sibling model, Llama 4 Maverick, uses 400 billion total parameters (17B active) with higher benchmark scores (MMMU 73.4%, MathVista 73.7%) but requires 3x H100s and has a 1M default context window. Scout prioritizes efficiency and accessibility for single-GPU deployment. 7)