---
id: 20260603-T0-12
title: "BudgetDraft：LLM推理加速新方案，提升稀疏KV缓存效率"
title_en: "BudgetDraft Boosts Sparse-KV Speculative Decoding"
url: https://ai.daily.yangsir.net/daily/20260603-T0-12
issue_date: 2026-06-03
publish_date: 2026-06-02T04:00:00.000Z
category: research
source_name: "arXiv cs.LG (ML)"
source_url: https://arxiv.org/abs/2606.00144
---

# BudgetDraft：LLM推理加速新方案，提升稀疏KV缓存效率

斯坦福研究提出BudgetDraft方案，通过接受感知多视图训练优化稀疏KV推测解码。该方法在保持高吞吐的同时，有效降低GPU内存峰值，适用于资源受限场景。实验显示，在相同算力下，推理速度提升15%以上。开发者可通过PyPI库快速集成，适用于大模型推理优化。

## English Version

**BudgetDraft Boosts Sparse-KV Speculative Decoding**

Stanford researchers propose BudgetDraft, an acceptance-aware multi-view training method for sparse-KV speculative decoding. It reduces GPU memory usage while improving inference speed by 15%+ in resource-constrained environments. Available via PyPI for LLM optimization.

---

**来源**：[arXiv cs.LG (ML)](https://arxiv.org/abs/2606.00144)

**详情页**：https://ai.daily.yangsir.net/daily/20260603-T0-12

---

*智语观潮 · Daily — https://ai.daily.yangsir.net/llms.txt*