====== Meta MTIA Chip ======

The **Meta Training and Inference Accelerator (MTIA)** is a family of custom AI chips developed by Meta in partnership with Broadcom and manufactured by TSMC. Originally designed for ranking and recommendation workloads, MTIA has rapidly evolved into a multi-generational inference platform serving billions of users across Meta's platforms. ((Source: [[https://aisystemcodesign.github.io/papers/MTIA-ISCA25.pdf|Meta — MTIA 2nd Generation ISCA Paper]]))

===== Why Custom Silicon =====

Meta's AI workload landscape spans four major domains: training vs inference, crossed with recommendation models vs generative AI. At planetary scale (3+ billion daily users), general-purpose GPUs proved economically unsustainable for inference workloads. MTIA was created to:

  * Reduce total cost of ownership — MTIA 2i achieves 44% TCO reduction compared to GPUs ((Source: [[https://aisystemcodesign.github.io/papers/MTIA-ISCA25.pdf|Meta — MTIA 2nd Generation ISCA Paper]]))
  * Mitigate supply risks from GPU vendor dependence
  * Co-design hardware with Meta's specific model architectures

===== Chip Generations =====

Meta has committed to one of the fastest custom chip iteration cycles in the industry — four generations in under two years:

^ Generation ^ Status ^ Primary Workload ^ Key Specs ^
| MTIA 300 | In production | Ranking and recommendation training | Current production chip deployed at scale |
| MTIA 400 | Testing complete | GenAI inference | 5x compute over MTIA 300; 50% more HBM bandwidth; 400% higher FP8 FLOPS |
| MTIA 450 | In development | GenAI inference | Doubles HBM bandwidth to 18.4 TB/s (from 9.2 TB/s on MTIA 400) |
| MTIA 500 | Roadmap | GenAI inference at scale | 4.5x HBM bandwidth and 25x compute FLOPS vs MTIA 300 |

All chips are built on **RISC-V architecture** and manufactured by TSMC with design partnership from Broadcom. ((Source: [[https://www.abhs.in/blog/meta-mtia-chip-roadmap-four-generations-inference-2026|Abhishek Gautam — Meta MTIA Roadmap]]))

===== Architecture =====

A key architectural differentiator is MTIA's memory hierarchy: instead of costly HBM alone, MTIA uses **large SRAM alongside LPDDR**, optimizing for inference workloads where memory access patterns differ from training. This model-chip co-design approach allows Meta to tailor the hardware to its specific neural network architectures. ((Source: [[https://aisystemcodesign.github.io/papers/MTIA-ISCA25.pdf|Meta — MTIA 2nd Generation ISCA Paper]]))

===== Deployment =====

  * **MTIA 2i** (predecessor to MTIA 300 naming) is deployed at scale, serving billions of users for ranking and recommendation
  * Hundreds of thousands of MTIA chips are already in production across Meta data centers
  * MTIA handles the highest-volume AI workload by query count: deciding what content appears in Instagram and Facebook feeds ((Source: [[https://medium.com/@santhosraj14/six-ai-chips-in-two-years-inside-metas-blazing-fast-mtia-architecture-89ad165d80a4|Santhosraj — Six AI Chips in Two Years]]))

===== Industry Context =====

Meta's MTIA program is part of a broader hyperscaler trend of building custom inference silicon to reduce dependence on NVIDIA GPUs. Similar efforts include Google's TPU, Amazon's Trainium/Inferentia, and Microsoft's Maia. Meta's approach is distinguished by its aggressive iteration speed and RISC-V architecture choice. ((Source: [[https://www.tomshardware.com/tech-industry/semiconductors/metas-mtia-chip-lineup-joins-hyperscaler-push-to-replace-nvidia-at-inference|Tom's Hardware — Meta MTIA Hyperscaler Push]]))

===== See Also =====

  * [[neural_processing_unit|Neural Processing Unit (NPU)]]
  * [[sram_centric_chips|SRAM-Centric Chips]]
  * [[nvidia_vera_rubin|Nvidia Vera Rubin]]

===== References =====