---
id: 20260501-T0-05
title: "RaMP提升MoE推理效率10-70%"
title_en: "RaMP Boosts MoE Inference by 10-70%"
url: https://ai.daily.yangsir.net/daily/20260501-T0-05
issue_date: 2026-05-01
publish_date: 2026-04-30T04:00:00.000Z
category: research
source_name: "arXiv cs.LG (ML)"
source_url: https://arxiv.org/abs/2604.26039
---

# RaMP提升MoE推理效率10-70%

RaMP是一种运行时感知的巨型内核多态技术，专为Mixture-of-Experts模型设计。传统MoE系统仅基于批量大小进行内核调度，而RaMP同时考虑批量大小和专家路由分布，可将内核吞吐量提升10-70%。该技术解决了MoE推理的关键效率瓶颈。

## English Version

**RaMP Boosts MoE Inference by 10-70%**

RaMP is a runtime-aware megakernel polymorphism technique for Mixture-of-Experts models. Unlike traditional systems that only consider batch size, RaMP accounts for both batch size and expert routing distribution, boosting kernel throughput by 10-70%.

---

**来源**：[arXiv cs.LG (ML)](https://arxiv.org/abs/2604.26039)

**详情页**：https://ai.daily.yangsir.net/daily/20260501-T0-05

---

*智语观潮 · Daily — https://ai.daily.yangsir.net/llms.txt*