---
id: 20260605-T0-11
title: "RUBAS：基于评分标准的强化学习提升AI代理安全性"
title_en: "RUBAS: Reinforcement Learning for Safer AI Agents"
url: https://ai.daily.yangsir.net/daily/20260605-T0-11
issue_date: 2026-06-05
publish_date: 2026-06-04T04:00:00.000Z
category: research
source_name: "arXiv cs.LG (ML)"
source_url: https://arxiv.org/abs/2606.04051
---

# RUBAS：基于评分标准的强化学习提升AI代理安全性

新研究提出RUBAS方法，通过基于评分标准的强化学习提升AI代理的安全性。现有对齐方法通常依赖粗略的拒绝信号或静态监督，难以处理工具使用中的复杂安全风险。RUBAS引入细粒度的评分标准，让代理在多步骤任务中自主评估风险，在代码执行和物理交互等高风险场景中表现优异，大幅减少有害行为的发生率。

## English Version

**RUBAS: Reinforcement Learning for Safer AI Agents**

New research proposes RUBAS, a reinforcement learning method using rubric-based standards to improve AI agent safety. Existing alignment methods rely on coarse refusal signals or static supervision, struggling with complex safety risks in tool use. RUBAS introduces fine-grained evaluation criteria, letting agents assess risks in multi-step tasks. It performs excellently in high-risk scenarios like code execution and physical interaction, significantly reducing harmful behaviors.

---

**来源**：[arXiv cs.LG (ML)](https://arxiv.org/abs/2606.04051)

**详情页**：https://ai.daily.yangsir.net/daily/20260605-T0-11

---

*智语观潮 · Daily — https://ai.daily.yangsir.net/llms.txt*