---
id: 20260305-T0-10
title: "ERI基准测试工程模型推理能力"
title_en: "ERI Benchmarks Engineering Model Capabilities"
url: https://ai.daily.yangsir.net/daily/20260305-T0-10
issue_date: 2026-03-05
publish_date: 2026-03-04T05:00:00.000Z
source_name: "arXiv cs.AI"
source_url: https://arxiv.org/abs/2603.02239
---

# ERI基准测试工程模型推理能力

arXiv论文《工程推理与指令基准》发布首个工程领域分类指令数据集，涵盖土木等9个工程学科，用于训练和评估具备工程能力的LLM和代理。该基准测试包含5000条复杂指令。

## English Version

**ERI Benchmarks Engineering Model Capabilities**

arXiv's Engineering Reasoning and Instruction benchmark released the first engineering instruction dataset covering 9 disciplines (e.g., civil engineering) for training and evaluating capable LLMs and agents. It includes 5,000 complex instructions for engineering domain evaluation.

---

**来源**：[arXiv cs.AI](https://arxiv.org/abs/2603.02239)

**详情页**：https://ai.daily.yangsir.net/daily/20260305-T0-10

---

*智语观潮 · Daily — https://ai.daily.yangsir.net/llms.txt*