---
id: 20260616-T0-20
title: "扩散策略优化现新问题：双漂移现象导致训练不稳定"
title_en: "Diffusion Policy Training Unstable Due to Double-Drift Phenomenon"
url: https://ai.daily.yangsir.net/daily/20260616-T0-20
issue_date: 2026-06-16
publish_date: 2026-06-15T04:00:00.000Z
category: research
source_name: "arXiv cs.LG (ML)"
source_url: https://arxiv.org/abs/2606.13795
---

# 扩散策略优化现新问题：双漂移现象导致训练不稳定

最新研究指出，当前扩散策略梯度训练方法存在双漂移问题，导致策略优化不稳定。该研究分析了现象成因，并提出了改进方案。扩散策略在强化学习后训练中扮演关键角色，此发现为提升策略可靠性提供了新方向。

## English Version

**Diffusion Policy Training Unstable Due to Double-Drift Phenomenon**

A new study identifies the double-drift phenomenon as the cause of instability in diffusion policy-gradient methods, which are crucial for RL post-training. The research proposes solutions to improve policy reliability and performance.

---

**来源**：[arXiv cs.LG (ML)](https://arxiv.org/abs/2606.13795)

**详情页**：https://ai.daily.yangsir.net/daily/20260616-T0-20

---

*智语观潮 · Daily — https://ai.daily.yangsir.net/llms.txt*