====== Meta MTIA Chip ====== The **Meta Training and Inference Accelerator (MTIA)** is a family of custom AI chips developed by Meta in partnership with Broadcom and manufactured by TSMC. Originally designed for ranking and recommendation workloads, MTIA has rapidly evolved into a multi-generational inference platform serving billions of users across Meta's platforms. ((Source: [[https://aisystemcodesign.github.io/papers/MTIA-ISCA25.pdf|Meta — MTIA 2nd Generation ISCA Paper]])) ===== Why Custom Silicon ===== Meta's AI workload landscape spans four major domains: training vs inference, crossed with recommendation models vs generative AI. At planetary scale (3+ billion daily users), general-purpose GPUs proved economically unsustainable for inference workloads. MTIA was created to: * Reduce total cost of ownership — MTIA 2i achieves 44% TCO reduction compared to GPUs ((Source: [[https://aisystemcodesign.github.io/papers/MTIA-ISCA25.pdf|Meta — MTIA 2nd Generation ISCA Paper]])) * Mitigate supply risks from GPU vendor dependence * Co-design hardware with Meta's specific model architectures ===== Chip Generations ===== Meta has committed to one of the fastest custom chip iteration cycles in the industry — four generations in under two years: ^ Generation ^ Status ^ Primary Workload ^ Key Specs ^ | MTIA 300 | In production | Ranking and recommendation training | Current production chip deployed at scale | | MTIA 400 | Testing complete | GenAI inference | 5x compute over MTIA 300; 50% more HBM bandwidth; 400% higher FP8 FLOPS | | MTIA 450 | In development | GenAI inference | Doubles HBM bandwidth to 18.4 TB/s (from 9.2 TB/s on MTIA 400) | | MTIA 500 | Roadmap | GenAI inference at scale | 4.5x HBM bandwidth and 25x compute FLOPS vs MTIA 300 | All chips are built on **RISC-V architecture** and manufactured by TSMC with design partnership from Broadcom. ((Source: [[https://www.abhs.in/blog/meta-mtia-chip-roadmap-four-generations-inference-2026|Abhishek Gautam — Meta MTIA Roadmap]])) ===== Architecture ===== A key architectural differentiator is MTIA's memory hierarchy: instead of costly HBM alone, MTIA uses **large SRAM alongside LPDDR**, optimizing for inference workloads where memory access patterns differ from training. This model-chip co-design approach allows Meta to tailor the hardware to its specific neural network architectures. ((Source: [[https://aisystemcodesign.github.io/papers/MTIA-ISCA25.pdf|Meta — MTIA 2nd Generation ISCA Paper]])) ===== Deployment ===== * **MTIA 2i** (predecessor to MTIA 300 naming) is deployed at scale, serving billions of users for ranking and recommendation * Hundreds of thousands of MTIA chips are already in production across Meta data centers * MTIA handles the highest-volume AI workload by query count: deciding what content appears in Instagram and Facebook feeds ((Source: [[https://medium.com/@santhosraj14/six-ai-chips-in-two-years-inside-metas-blazing-fast-mtia-architecture-89ad165d80a4|Santhosraj — Six AI Chips in Two Years]])) ===== Industry Context ===== Meta's MTIA program is part of a broader hyperscaler trend of building custom inference silicon to reduce dependence on NVIDIA GPUs. Similar efforts include Google's TPU, Amazon's Trainium/Inferentia, and Microsoft's Maia. Meta's approach is distinguished by its aggressive iteration speed and RISC-V architecture choice. ((Source: [[https://www.tomshardware.com/tech-industry/semiconductors/metas-mtia-chip-lineup-joins-hyperscaler-push-to-replace-nvidia-at-inference|Tom's Hardware — Meta MTIA Hyperscaler Push]])) ===== See Also ===== * [[neural_processing_unit|Neural Processing Unit (NPU)]] * [[sram_centric_chips|SRAM-Centric Chips]] * [[nvidia_vera_rubin|Nvidia Vera Rubin]] ===== References =====