---
id: 20260530-T0-07
title: "研究发现强化学习比监督微调更防模型遗忘"
title_en: "RL Prevents Forgetting Better Than SFT in LLMs"
url: https://ai.daily.yangsir.net/daily/20260530-T0-07
issue_date: 2026-05-30
publish_date: 2026-05-29T04:00:00.000Z
category: research
source_name: "arXiv cs.LG (ML)"
source_url: https://arxiv.org/abs/2605.28860
---

# 研究发现强化学习比监督微调更防模型遗忘

最新研究揭示大模型微调导致灾难性遗忘的机制。对比发现强化学习比监督微调更能保留模型原有能力，因RL能保持特定神经回路结构，为防止模型遗忘提供新思路。

## English Version

**RL Prevents Forgetting Better Than SFT in LLMs**

New research reveals RL preserves LLM capabilities better than SFT by maintaining specific neural circuits, providing insights for preventing catastrophic forgetting during training.

---

**来源**：[arXiv cs.LG (ML)](https://arxiv.org/abs/2605.28860)

**详情页**：https://ai.daily.yangsir.net/daily/20260530-T0-07

---

*智语观潮 · Daily — https://ai.daily.yangsir.net/llms.txt*