---
id: 20260414-T0-07
title: "CSAttention：用质心评分加速LLM推理，计算成本降低30%"
title_en: "CSAttention: Accelerates LLM Inference with 30% Lower Compute Cost"
url: https://ai.daily.yangsir.net/daily/20260414-T0-07
issue_date: 2026-04-14
publish_date: 2026-04-13T04:00:00.000Z
category: research
source_name: "arXiv cs.LG (ML)"
source_url: https://arxiv.org/abs/2604.08584
---

# CSAttention：用质心评分加速LLM推理，计算成本降低30%

康奈尔大学提出CSAttention新方法，通过质心评分机制优化长上下文LLM的注意力计算，将预填充阶段的计算和传输成本降低30%。该方法特别针对智能体和领域问答场景中的长提示优化，可有效缓解KV缓存瓶颈问题。研究者在新论文中验证了其在Transformer架构中的有效性，为高效率大模型推理提供新思路。

## English Version

**CSAttention: Accelerates LLM Inference with 30% Lower Compute Cost**

Cornell researchers propose CSAttention, a new method that reduces compute and transfer costs by 30% for long-context LLMs. The centroid-scoring mechanism optimizes attention computation during prefill, addressing KV-cache bottlenecks in agents and domain QA scenarios. The approach shows effectiveness in Transformer architectures.

---

**来源**：[arXiv cs.LG (ML)](https://arxiv.org/abs/2604.08584)

**详情页**：https://ai.daily.yangsir.net/daily/20260414-T0-07

---

*智语观潮 · Daily — https://ai.daily.yangsir.net/llms.txt*