---
id: 20260515-T0-12
title: "DocAtlas：支持80+语言的文档理解模型"
title_en: "DocAtlas: Multilingual Doc Understanding in 80+ Languages"
url: https://ai.daily.yangsir.net/daily/20260515-T0-12
issue_date: 2026-05-15
publish_date: 2026-05-14T04:00:00.000Z
category: research
source_name: "arXiv cs.CL (NLP)"
source_url: https://arxiv.org/abs/2605.12623
---

# DocAtlas：支持80+语言的文档理解模型

arXiv论文推出DocAtlas框架，通过构建高保真OCR数据集解决低资源语言文档理解难题。该框架能处理80多种语言的文档，有效克服了传统模型对小语种数据稀缺的偏见，为多语言AI应用提供新方案。

## English Version

**DocAtlas: Multilingual Doc Understanding in 80+ Languages**

The DocAtlas paper introduces a framework that builds high-fidelity OCR datasets to overcome data scarcity for low-resource languages. It enables document understanding in 80+ languages, effectively mitigating biases in traditional models and advancing multilingual AI applications.

---

**来源**：[arXiv cs.CL (NLP)](https://arxiv.org/abs/2605.12623)

**详情页**：https://ai.daily.yangsir.net/daily/20260515-T0-12

---

*智语观潮 · Daily — https://ai.daily.yangsir.net/llms.txt*