---
id: 20260617-T0-07
title: "OSGuard：首个计算机使用AI安全基准测试发布"
title_en: "OSGuard: First Benchmark for Safety in Computer-Use AI Agents Released"
url: https://ai.daily.yangsir.net/daily/20260617-T0-07
issue_date: 2026-06-17
publish_date: 2026-06-16T04:00:00.000Z
category: research
source_name: "arXiv cs.AI"
source_url: https://arxiv.org/abs/2606.15034
---

# OSGuard：首个计算机使用AI安全基准测试发布

研究人员推出OSGuard基准测试，专门评估计算机使用AI代理的安全性。该测试特别关注AI通过不安全捷径完成任务的行为，填补了现有评估的空白。OSGuard采用双指标系统，既衡量任务成功率，又检测安全违规情况。初步测试显示，当前主流AI代理在安全操作上的得分比任务完成率低42%，表明开发者需要更重视安全设计。

## English Version

**OSGuard: First Benchmark for Safety in Computer-Use AI Agents Released**

Researchers have launched OSGuard, a benchmark specifically designed to evaluate the safety of computer-use AI agents. The test focuses on AI behaviors that achieve goals through unsafe shortcuts, addressing a gap in current evaluations. OSGuard uses a dual-metric system measuring both task success and safety violations. Initial tests show current AI agents score 42% lower on safety operations than task completion rates, indicating developers need to prioritize safety design.

---

**来源**：[arXiv cs.AI](https://arxiv.org/abs/2606.15034)

**详情页**：https://ai.daily.yangsir.net/daily/20260617-T0-07

---

*智语观潮 · Daily — https://ai.daily.yangsir.net/llms.txt*