---
id: 20260409-T0-12
title: "Cactus通过约束接受推测采样加速自回归解码"
title_en: "Cactus Accelerates Decoding via Constrained Acceptance Sampling"
url: https://ai.daily.yangsir.net/daily/20260409-T0-12
issue_date: 2026-04-09
publish_date: 2026-04-08T04:00:00.000Z
source_name: "arXiv cs.LG (ML)"
source_url: https://arxiv.org/abs/2604.04987
---

# Cactus通过约束接受推测采样加速自回归解码

研究提出Cactus方法，通过约束接受推测采样（SpS）加速大模型自回归解码。传统SpS利用小型草稿模型加速解码，但强制生成分布与验证LLM严格匹配。Cactus引入约束接受机制，在保持准确性的同时提高吞吐量。该方法在保持模型性能的同时，将解码速度提升约20%，适用于需要快速生成的场景，如实时对话系统。

## English Version

**Cactus Accelerates Decoding via Constrained Acceptance Sampling**

Researchers present Cactus, which accelerates autregressive decoding via Constrained Acceptance Speculative Sampling. Traditional SpS uses draft models but enforces strict distribution matching. Cactus introduces constrained acceptance to boost throughput while maintaining accuracy. It improves decoding speed by ~20% while preserving model performance, suitable for real-time applications.

---

**来源**：[arXiv cs.LG (ML)](https://arxiv.org/abs/2604.04987)

**详情页**：https://ai.daily.yangsir.net/daily/20260409-T0-12

---

*智语观潮 · Daily — https://ai.daily.yangsir.net/llms.txt*