---
id: 20260602-T0-08
title: "NumLeak发现基础模型训练数据泄露问题"
title_en: "NumLeak Exposes Data Leakage in Foundation Models"
url: https://ai.daily.yangsir.net/daily/20260602-T0-08
issue_date: 2026-06-02
publish_date: 2026-06-01T04:00:00.000Z
category: research
source_name: "arXiv cs.LG (ML)"
source_url: https://arxiv.org/abs/2605.30393
---

# NumLeak发现基础模型训练数据泄露问题

研究人员发现，公共数值基准测试在预训练阶段就已出现，导致基于日期的评估可能测量的是记忆回溯而非真实泛化能力。NumLeak框架可检测此类数据泄露。

## English Version

**NumLeak Exposes Data Leakage in Foundation Models**

Researchers found that public numeric benchmarks appear during pretraining, risking evaluations that measure memorized recall rather than true skill. NumLeak provides a framework to detect this data leakage via API probes.

---

**来源**：[arXiv cs.LG (ML)](https://arxiv.org/abs/2605.30393)

**详情页**：https://ai.daily.yangsir.net/daily/20260602-T0-08

---

*智语观潮 · Daily — https://ai.daily.yangsir.net/llms.txt*