---
id: 20260512-T0-07
title: "RateQuant：基于率失真理论优化KV缓存量化"
title_en: "RateQuant Optimizes KV Cache Quantization Using Rate-Distortion Theory"
url: https://ai.daily.yangsir.net/daily/20260512-T0-07
issue_date: 2026-05-12
publish_date: 2026-05-11T04:00:00.000Z
category: research
source_name: "arXiv cs.LG (ML)"
source_url: https://arxiv.org/abs/2605.06675
---

# RateQuant：基于率失真理论优化KV缓存量化

加州大学团队提出RateQuant新方法，通过率失真理论优化KV缓存量化策略。该方法能精确计算最优的量化比特数，将内存占用降低30%的同时保持98%的性能。实验显示，在GPT-3等大模型上，推理速度提升40%，内存成本显著降低，适用于大模型高效部署。

## English Version

**RateQuant Optimizes KV Cache Quantization Using Rate-Distortion Theory**

UC researchers developed RateQuant, a KV cache quantization method based on rate-distortion theory. It optimizes bit allocation for quantization, reducing memory by 30% while maintaining 98% performance. Tested on GPT-3, it improves inference speed by 40% and significantly cuts memory costs for efficient LLM deployment.

---

**来源**：[arXiv cs.LG (ML)](https://arxiv.org/abs/2605.06675)

**详情页**：https://ai.daily.yangsir.net/daily/20260512-T0-07

---

*智语观潮 · Daily — https://ai.daily.yangsir.net/llms.txt*