---
id: 20260528-T0-02
title: "AI智能体做企业IT任务不及格：Artificial Analysis联合IBM发布ITBench-AA基准，前沿模型得分均低于50%"
title_en: "Frontier AI Models Score Below 50% on ITBench-AA, First Benchmark for Agentic Enterprise IT"
url: https://ai.daily.yangsir.net/daily/20260528-T0-02
issue_date: 2026-05-28
publish_date: 2026-05-27T17:20:29.000Z
source_name: "Hugging Face Blog"
source_url: https://huggingface.co/blog/ibm-research/itbench-aa
---

# AI智能体做企业IT任务不及格：Artificial Analysis联合IBM发布ITBench-AA基准，前沿模型得分均低于50%

Artificial Analysis联合IBM发布了ITBench-AA基准测试，专门评测大模型在真实企业IT运维任务中的智能体（Agentic）表现。测试结果显示，目前所有前沿模型的得分均低于50%，表明大模型在自主完成企业级IT任务（如故障排查、配置管理、安全合规等）方面仍有明显短板。该基准的发布为行业提供了一个量化评估AI智能体企业落地能力的标尺，开发者和企业IT团队可以用它来衡量不同模型在实际运维场景中的可用性，而非仅看通用基准分数。

## English Version

**Frontier AI Models Score Below 50% on ITBench-AA, First Benchmark for Agentic Enterprise IT**

Artificial Analysis and IBM released ITBench-AA, the first benchmark designed to evaluate AI agents on real enterprise IT tasks such as incident troubleshooting, configuration management, and security compliance. All frontier models scored below 50%, revealing significant gaps in agents' ability to autonomously handle enterprise IT operations. The benchmark gives developers and enterprise IT teams a concrete tool to measure how different models perform in real-world operational scenarios, moving beyond generic benchmark scores.

---

**来源**：[Hugging Face Blog](https://huggingface.co/blog/ibm-research/itbench-aa)

**详情页**：https://ai.daily.yangsir.net/daily/20260528-T0-02

---

*智语观潮 · Daily — https://ai.daily.yangsir.net/llms.txt*