---
id: 20260505-T0-07
title: "多轮智能体训练总崩溃？自适应熵调制提升RL训练稳定性"
title_en: "AEM: Adaptive Entropy Modulation Stabilizes Multi-Turn Agentic RL Training"
url: https://ai.daily.yangsir.net/daily/20260505-T0-07
issue_date: 2026-05-05
publish_date: 2026-05-04T04:00:00.000Z
category: research
source_name: "arXiv cs.AI"
source_url: https://arxiv.org/abs/2605.00425
---

# 多轮智能体训练总崩溃？自适应熵调制提升RL训练稳定性

强化学习（RL）极大提升了大模型智能体在多轮环境中的交互能力，但稀疏的奖励机制使得有效训练变得非常困难。为解决这一痛点，研究人员提出了自适应熵调制（AEM）方法。该方法通过在多轮交互过程中动态调整策略的熵，解决了智能体过早陷入次优行为或探索不足的问题，从而显著提升了训练过程的稳定性和最终的任务表现。开发者可以用该方案来更高效地训练执行复杂长线任务的AI智能体。

## English Version

**AEM: Adaptive Entropy Modulation Stabilizes Multi-Turn Agentic RL Training**

While reinforcement learning advances LLM agents in multi-turn tasks, sparse rewards make training challenging. Researchers introduced Adaptive Entropy Modulation (AEM) for multi-turn agentic RL. By dynamically adjusting policy entropy during interactions, AEM prevents agents from converging prematurely on suboptimal behaviors, improving training stability.

---

**来源**：[arXiv cs.AI](https://arxiv.org/abs/2605.00425)

**详情页**：https://ai.daily.yangsir.net/daily/20260505-T0-07

---

*智语观潮 · Daily — https://ai.daily.yangsir.net/llms.txt*