---
id: 20260610-T0-05
title: "指令层级失效：推理模型如何服从冲突指令"
title_en: "Instruction hierarchy failure in reasoning models"
url: https://ai.daily.yangsir.net/daily/20260610-T0-05
issue_date: 2026-06-10
publish_date: 2026-06-09T04:00:00.000Z
category: research
source_name: "arXiv cs.AI"
source_url: https://arxiv.org/abs/2606.07808
---

# 指令层级失效：推理模型如何服从冲突指令

arXiv新研究指出，现有推理模型在处理冲突指令时存在系统性缺陷。当不同来源的指令冲突时，模型应优先服从高权限指令，但现有基准测试无法有效评估这一能力。研究通过分析多代理工作流中的指令处理问题，揭示了模型在层级服从机制上的失败点。此发现对构建可靠的AI代理系统至关重要，特别是在需要严格指令遵循的场景。

## English Version

**Instruction hierarchy failure in reasoning models**

arXiv research reveals systematic failures in reasoning models when handling conflicting instructions. Models should obey high-privilege instructions from different sources, but existing benchmarks can't evaluate this. The study analyzes instruction processing in multi-agent workflows, identifying failure points in hierarchy obedience. This is critical for building reliable AI agents requiring strict instruction following.

---

**来源**：[arXiv cs.AI](https://arxiv.org/abs/2606.07808)

**详情页**：https://ai.daily.yangsir.net/daily/20260610-T0-05

---

*智语观潮 · Daily — https://ai.daily.yangsir.net/llms.txt*