---
id: 20260406-T0-01
title: "Anthropic量化代码评估中的基础设施噪声"
title_en: "Anthropic Quantifies Noise in Coding Evals"
url: https://ai.daily.yangsir.net/daily/20260406-T0-01
issue_date: 2026-04-06
publish_date: 2026-04-06T00:44:47.000Z
category: research
source_name: "Anthropic Engineering"
source_url: https://www.anthropic.com/engineering/infrastructure-noise
---

# Anthropic量化代码评估中的基础设施噪声

Anthropic发布新研究，首次量化了智能体代码评估中的基础设施噪声。研究发现，底层系统波动会导致相同代码生成任务产生不一致的结果，误差率高达15%。该研究为AI代码生成工具提供了更准确的评估框架，开发者可据此优化测试环境，减少外部干扰对模型性能的误判。

## English Version

**Anthropic Quantifies Noise in Coding Evals**

Anthropic's new research quantifies infrastructure noise in agentic coding evals, revealing system fluctuations cause inconsistent results in identical tasks, with error rates up to 15%. This provides a more accurate evaluation framework for AI coding tools, helping developers optimize test environments and reduce misjudgments of model performance.

---

**来源**：[Anthropic Engineering](https://www.anthropic.com/engineering/infrastructure-noise)

**详情页**：https://ai.daily.yangsir.net/daily/20260406-T0-01

---

*智语观潮 · Daily — https://ai.daily.yangsir.net/llms.txt*