---
id: 20260616-T0-06
title: "移动端NPU加速扩散LLM推理"
title_en: "Mobile NPU Accelerates Diffusion LLM Inference"
url: https://ai.daily.yangsir.net/daily/20260616-T0-06
issue_date: 2026-06-16
publish_date: 2026-06-15T04:00:00.000Z
category: research
source_name: "arXiv cs.LG (ML)"
source_url: https://arxiv.org/abs/2606.13740
---

# 移动端NPU加速扩散LLM推理

研究人员提出面向移动设备的扩散LLM（dLLM）推理优化方案，利用NPU实现并行去噪。该方法在骁龙8 Gen 3测试中，推理速度提升3.2倍，能耗降低40%。通过动态批处理和量化技术，使dLLM首次在智能手机上达到实时响应水平。开发者已开源模型和代码，推动AI应用在移动端普及。

## English Version

**Mobile NPU Accelerates Diffusion LLM Inference**

Researchers propose an optimized diffusion LLM (dLLM) inference method for mobile devices, using NPU for parallel denoising. Tested on Snapdragon 8 Gen 3, it achieves 3.2x speedup and 40% lower energy consumption. Dynamic batching and quantization enable real-time dLLM responses on smartphones for the first time. The model and code are open-sourced, accelerating mobile AI adoption.

---

**来源**：[arXiv cs.LG (ML)](https://arxiv.org/abs/2606.13740)

**详情页**：https://ai.daily.yangsir.net/daily/20260616-T0-06

---

*智语观潮 · Daily — https://ai.daily.yangsir.net/llms.txt*