---
id: 20260404-T0-12
title: "大模型安全机制可被重新激活，无需重新训练"
title_en: "LLM Safety Mechanisms Reactivated Without Retraining"
url: https://ai.daily.yangsir.net/daily/20260404-T0-12
issue_date: 2026-04-04
publish_date: 2026-04-03T04:00:00.000Z
category: research
source_name: "arXiv cs.CL (NLP)"
source_url: https://arxiv.org/abs/2604.00012
---

# 大模型安全机制可被重新激活，无需重新训练

大模型需微调才能发挥特定任务能力，但可能覆盖安全机制。研究提出一种方法，可重新激活训练后的隐藏安全机制。实验显示，该方法在DeepSeek-R1等模型上恢复了80%的安全性能，同时保持90%的任务效率。企业安全团队可用此方法在不牺牲性能的前提下修复模型安全漏洞，降低合规风险。该技术适用于已部署的工业级大模型。

## English Version

**LLM Safety Mechanisms Reactivated Without Retraining**

LLMs often lose safety mechanisms during fine-tuning. Researchers found a method to reactivate hidden safety mechanisms post-training. Experiments show it restores 80% safety performance in models like DeepSeek-R1 while maintaining 90% task efficiency. Security teams can use this to fix vulnerabilities in deployed models without retraining, reducing compliance risks.

---

**来源**：[arXiv cs.CL (NLP)](https://arxiv.org/abs/2604.00012)

**详情页**：https://ai.daily.yangsir.net/daily/20260404-T0-12

---

*智语观潮 · Daily — https://ai.daily.yangsir.net/llms.txt*