====== Neural Processing Unit (NPU) ======

A **Neural Processing Unit (NPU)** is a specialized hardware accelerator designed to efficiently execute artificial intelligence and machine learning workloads, particularly neural network operations such as matrix multiplications, convolutions, and tensor math. NPUs mimic brain-like parallel processing with ultra-low power consumption, enabling real-time AI inference directly on edge devices such as smartphones, laptops, and IoT sensors. ((Source: [[https://www.ibm.com/think/topics/neural-processing-unit|IBM — Neural Processing Unit]]))

===== Architecture =====

NPU architectures are optimized for AI inference through several key design principles:

  * **Systolic arrays** of Multiply-Accumulate (MAC) units arranged in grids, enabling trillions of parallel computations per second ((Source: [[https://builtin.com/articles/npu-neural-processing-unit|BuiltIn — NPU Neural Processing Unit]]))
  * **Low-precision arithmetic** using 8-bit or 16-bit integers to maximize energy efficiency and throughput ((Source: [[https://www.ibm.com/think/topics/neural-processing-unit|IBM — Neural Processing Unit]]))
  * **High-bandwidth on-chip memory** with dedicated buffers, DMA engines, and streaming hybrid vector engines (SHAVE) to minimize data movement latency ((Source: [[https://intel.github.io/intel-npu-acceleration-library/npu.html|Intel NPU Acceleration Library]]))
  * **Scalable multi-tile designs** with Neural Compute Engines for matrix multiplication, convolution, and vector operations

===== NPU vs GPU vs TPU =====

^ Aspect ^ NPU ^ GPU ^ TPU ^
| Primary Focus | AI inference on edge devices with ultra-low power | Parallel graphics and general compute; excels at training | Google-specific tensor ops for large-scale training/inference |
| Power Efficiency | Highest for always-on AI tasks | Higher power draw; suited for bursty workloads | Optimized for cloud but power-hungry vs NPUs |
| Architecture | Systolic arrays for inference | Thousands of shader cores for general parallelism | Tensor cores in Google's ecosystem |
| Typical Deployment | On-device (phones, laptops, IoT) | Data centers and workstations | Google Cloud TPU pods |

NPUs outperform GPUs in energy efficiency for on-device AI inference but have lower raw compute power for training. TPUs are cloud-focused with narrower applicability outside Google's ecosystem. ((Source: [[https://builtin.com/articles/npu-neural-processing-unit|BuiltIn — NPU Neural Processing Unit]]))

===== Manufacturers and Products =====

Major semiconductor companies integrate NPUs into their System-on-Chip (SoC) designs:

  * **Intel** — Core Ultra processors (Meteor Lake, Arrow Lake) feature scalable Neural Compute Engines delivering 40+ TOPS ((Source: [[https://intel.github.io/intel-npu-acceleration-library/npu.html|Intel NPU Acceleration Library]]))
  * **Qualcomm** — Hexagon NPU in Snapdragon 8 Gen series, optimized for low-power generative AI inference ((Source: [[https://www.qualcomm.com/news/onq/2024/02/what-is-an-npu-and-why-is-it-key-to-unlocking-on-device-generative-ai|Qualcomm — What is an NPU]]))
  * **Apple** — Neural Engine in A-series (iPhone) and M-series (Mac) chips, with the M4 delivering 38 TOPS ((Source: [[https://builtin.com/articles/npu-neural-processing-unit|BuiltIn — NPU Neural Processing Unit]]))
  * **AMD** — XDNA architecture in Ryzen AI processors (Ryzen AI 400 series) delivering up to 50 TOPS ((Source: [[https://www.lenovo.com/us/en/glossary/what-is-an-npu/|Lenovo — What is an NPU]]))
  * **Samsung** — NPUs integrated in Exynos SoCs for mobile AI workloads ((Source: [[https://semiconductor.samsung.com/support/tools-resources/dictionary/the-neural-processing-unit-npu-a-brainy-next-generation-semiconductor/|Samsung — NPU Dictionary]]))
  * **Arm** — Ethos-N series targeting 8/16-bit quantized neural networks for licensees ((Source: [[https://developer.arm.com/documentation/102420/0200/Neural-processing-unit-introduction/Description-of-the-neural-processing-unit|Arm — NPU Introduction]]))

===== Use Cases =====

In 2025-2026, NPUs drive on-device generative AI and real-time edge computing across multiple domains:

  * **Generative AI inference** — Local LLMs, on-device chatbots, and image generation without cloud latency
  * **Image and video processing** — Real-time object detection, background blur for video calls, computational photography
  * **Speech and NLP** — Voice assistants, transcription, natural language understanding
  * **Computer vision** — Facial unlock, AR/VR rendering, ADAS in vehicles
  * **Always-on sensing** — Pattern detection in wearables and IoT with minimal power draw

NPUs enable the "AI PC" category with 40-100+ TOPS of dedicated AI compute, offloading CPUs and GPUs for seamless multitasking. ((Source: [[https://builtin.com/articles/npu-neural-processing-unit|BuiltIn — NPU Neural Processing Unit]]))

===== See Also =====

  * [[amd_ryzen_ai_pro|AMD Ryzen AI Pro 400 Series]]
  * [[analog_ai_chip|Analog AI Chips]]
  * [[meta_mtia_chip|Meta MTIA Chip]]

===== References =====