---
id: 20260519-T0-12
title: "新方法降低大模型安全对齐的性能损失"
title_en: "New Method Reduces Performance Loss in LLM Safety Alignment"
url: https://ai.daily.yangsir.net/daily/20260519-T0-12
issue_date: 2026-05-19
publish_date: 2026-05-18T04:00:00.000Z
category: research
source_name: "arXiv cs.LG (ML)"
source_url: https://arxiv.org/abs/2605.15239
---

# 新方法降低大模型安全对齐的性能损失

一项最新研究提出了一种创新方法，可有效减少大模型安全对齐过程中的性能损失。该方法通过同策略自蒸馏技术，解决了监督微调中因分布不匹配导致的推理能力下降问题。实验表明，该方法在保持模型安全性的同时，将推理性能提升了15%以上，为低风险AI应用提供了新方案。

## English Version

**New Method Reduces Performance Loss in LLM Safety Alignment**

Researchers have developed an innovative method to reduce performance loss during LLM safety alignment. Using on-policy self-distillation, the approach addresses the reasoning ability decline caused by distributional mismatch in supervised fine-tuning. Experiments show the method improves reasoning performance by over 15% while maintaining safety, offering a new solution for low-risk AI applications.

---

**来源**：[arXiv cs.LG (ML)](https://arxiv.org/abs/2605.15239)

**详情页**：https://ai.daily.yangsir.net/daily/20260519-T0-12

---

*智语观潮 · Daily — https://ai.daily.yangsir.net/llms.txt*