---
id: 20260526-T0-02
title: "Tensor Cache：滑动窗口外的Token不再丢失，Transformer记忆更持久"
title_en: "Tensor Cache: Retains Evicted Tokens for Longer Transformer Memory"
url: https://ai.daily.yangsir.net/daily/20260526-T0-02
issue_date: 2026-05-26
publish_date: 2026-05-25T04:00:00.000Z
category: research
source_name: "arXiv cs.LG (ML)"
source_url: https://arxiv.org/abs/2605.22884
---

# Tensor Cache：滑动窗口外的Token不再丢失，Transformer记忆更持久

自回归Transformer的KV缓存随上下文长度线性增长，而滑动窗口缓存会直接丢弃窗口外的Token，导致模型无法访问早期关键信息。Tensor Cache提出两级缓存机制，根据淘汰条件将移出窗口的Token压缩存储在联想记忆模块中，而非直接删除。这种方法在控制显存占用的同时，保留了长文本中关键证据的检索能力，对处理长文档的RAG系统和长对话场景有直接帮助。

## English Version

**Tensor Cache: Retains Evicted Tokens for Longer Transformer Memory**

Transformer KV caches grow linearly with context length, and sliding-window caching discards evicted tokens, losing early evidence. Tensor Cache introduces a two-level caching mechanism that stores evicted tokens in compressed associative memory instead of deleting them. It maintains memory efficiency while preserving retrieval of key evidence in long contexts, benefiting RAG systems and long-conversation applications.

---

**来源**：[arXiv cs.LG (ML)](https://arxiv.org/abs/2605.22884)

**详情页**：https://ai.daily.yangsir.net/daily/20260526-T0-02

---

*智语观潮 · Daily — https://ai.daily.yangsir.net/llms.txt*