---
id: 20260521-T0-10
title: "DecisionBench评估代理任务委派能力"
title_en: "DecisionBench Benchmarks Agent Delegation Skills"
url: https://ai.daily.yangsir.net/daily/20260521-T0-10
issue_date: 2026-05-21
publish_date: 2026-05-20T04:00:00.000Z
category: research
source_name: "arXiv cs.AI"
source_url: https://arxiv.org/abs/2605.19099
---

# DecisionBench评估代理任务委派能力

arXiv论文提出DecisionBench基准，用于评估长周期代理工作流中的 emergent delegation。该基准包含GAIA等任务套件，测试11个模型家族的委派性能，提供标准化评估接口。

## English Version

**DecisionBench Benchmarks Agent Delegation Skills**

arXiv paper introduces DecisionBench, a benchmark for evaluating emergent delegation in long-horizon agent workflows. It includes task suites like GAIA and tests 11 models across 7 vendor families, offering standardized evaluation interfaces for agent collaboration scenarios.

---

**来源**：[arXiv cs.AI](https://arxiv.org/abs/2605.19099)

**详情页**：https://ai.daily.yangsir.net/daily/20260521-T0-10

---

*智语观潮 · Daily — https://ai.daily.yangsir.net/llms.txt*