---
id: 20260521-T0-12
title: "POLAR-Bench：首个评估LLM代理隐私-效用权衡的基准测试"
title_en: "POLAR-Bench: First Benchmark for Privacy-Utility Trade-offs in LLM Agents"
url: https://ai.daily.yangsir.net/daily/20260521-T0-12
issue_date: 2026-05-21
publish_date: 2026-05-20T04:00:00.000Z
category: research
source_name: "arXiv cs.AI"
source_url: https://arxiv.org/abs/2605.19127
---

# POLAR-Bench：首个评估LLM代理隐私-效用权衡的基准测试

研究人员发布POLAR-Bench基准测试，专门评估LLM代理在处理用户隐私数据时的表现。该基准测试模拟代理与第三方系统交互的场景，检测代理是否能严格遵守用户的数据共享规则，即使面临系统诱导时也能保持隐私保护。测试包含多个典型案例，覆盖医疗、金融等敏感领域，旨在帮助开发更可靠、更安全的AI代理系统。论文已在arXiv发布。

## English Version

**POLAR-Bench: First Benchmark for Privacy-Utility Trade-offs in LLM Agents**

Researchers introduced POLAR-Bench, the first benchmark designed to evaluate LLM agents' performance in handling private user data. It simulates interactions with third-party systems to test whether agents strictly follow users' data-sharing rules, even under system inducement. The benchmark includes cases from sensitive domains like healthcare and finance, helping developers build more secure AI agents. Paper available on arXiv.

---

**来源**：[arXiv cs.AI](https://arxiv.org/abs/2605.19127)

**详情页**：https://ai.daily.yangsir.net/daily/20260521-T0-12

---

*智语观潮 · Daily — https://ai.daily.yangsir.net/llms.txt*