---
id: 20260605-T0-12
title: "LazyAttention：延迟位置编码提升RAG推理效率"
title_en: "LazyAttention: Efficient RAG with Deferred Positioning"
url: https://ai.daily.yangsir.net/daily/20260605-T0-12
issue_date: 2026-06-05
publish_date: 2026-06-04T04:00:00.000Z
category: research
source_name: "arXiv cs.CL (NLP)"
source_url: https://arxiv.org/abs/2606.04302
---

# LazyAttention：延迟位置编码提升RAG推理效率

研究人员提出LazyAttention方法，通过延迟位置编码提升检索增强生成的效率。传统KV缓存方法在长上下文RAG任务中计算效率低下。LazyAttention将位置编码推迟到实际需要时再计算，减少了60%的计算量，同时保持相同性能。这项优化特别适合处理长文档检索和对话历史存储等场景，将显著降低大模型的推理成本。

## English Version

**LazyAttention: Efficient RAG with Deferred Positioning**

Researchers propose LazyAttention to improve RAG efficiency through deferred positional encoding. Traditional KV caching methods are computationally inefficient in long-context RAG tasks. LazyAttention delays position encoding until needed, reducing computation by 60% while maintaining performance. This optimization is particularly effective for long document retrieval and conversation history storage, significantly reducing LLM inference costs.

---

**来源**：[arXiv cs.CL (NLP)](https://arxiv.org/abs/2606.04302)

**详情页**：https://ai.daily.yangsir.net/daily/20260605-T0-12

---

*智语观潮 · Daily — https://ai.daily.yangsir.net/llms.txt*