---
id: 20260429-T0-08
title: "研究提出KV缓存自适应共享机制"
title_en: "Stochastic KV Routing Enables Adaptive Cache Sharing"
url: https://ai.daily.yangsir.net/daily/20260429-T0-08
issue_date: 2026-04-29
publish_date: 2026-04-28T04:00:00.000Z
category: research
source_name: "arXiv cs.LG (ML)"
source_url: https://arxiv.org/abs/2604.22782
---

# 研究提出KV缓存自适应共享机制

论文《Stochastic KV Routing》提出Transformer模型键值缓存自适应共享方案。该方法通过随机路由实现深度级缓存共享，降低内存占用30%，提升推理吞吐量。适用于大语言模型高效服务部署。

## English Version

**Stochastic KV Routing Enables Adaptive Cache Sharing**

The paper 'Stochastic KV Routing' proposes adaptive Key-Value cache sharing for Transformers. By implementing stochastic routing at depth-level, it reduces memory footprint by 30% and boosts inference throughput. Ideal for efficient LLM serving deployment.

---

**来源**：[arXiv cs.LG (ML)](https://arxiv.org/abs/2604.22782)

**详情页**：https://ai.daily.yangsir.net/daily/20260429-T0-08

---

*智语观潮 · Daily — https://ai.daily.yangsir.net/llms.txt*