---
id: 20260314-T0-21
title: "DeReason论文：难度感知课程提升解耦SFT-RL训练"
title_en: "DeReason Paper: Difficulty-Aware Course Improves SFT-RL Training"
url: https://ai.daily.yangsir.net/daily/20260314-T0-21
issue_date: 2026-03-14
publish_date: 2026-03-13T04:00:00.000Z
source_name: "arXiv cs.CL (NLP)"
source_url: https://arxiv.org/abs/2603.11193
---

# DeReason论文：难度感知课程提升解耦SFT-RL训练

DeReason论文提出一种难度感知课程学习框架，优化解耦的SFT-then-RL训练流程。该研究针对数学和编码任务，通过渐进式难度设计显著提升了大语言模型的可验证强化学习性能。实验表明，该方法能有效解决传统RLVR训练中的知识遗忘问题，将复杂推理任务准确率提高18%。论文已发布于arXiv:2603.11193v1，为通用推理能力训练提供了新范式。

## English Version

**DeReason Paper: Difficulty-Aware Course Improves SFT-RL Training**

DeReason paper introduces a difficulty-aware curriculum framework to optimize decoupled SFT-then-RL training. For math and coding tasks, the progressive difficulty design significantly improves verifiable RL performance. Experiments show the method solves knowledge forgetting issues, increasing complex reasoning accuracy by 18%. Published on arXiv:2603.11193v1.

---

**来源**：[arXiv cs.CL (NLP)](https://arxiv.org/abs/2603.11193)

**详情页**：https://ai.daily.yangsir.net/daily/20260314-T0-21

---

*智语观潮 · Daily — https://ai.daily.yangsir.net/llms.txt*