---
id: 20260616-T0-04
title: "GitHub发布多语言开发者数据集，助力AI研究"
title_en: "GitHub Launches Multilingual Dataset for AI Research"
url: https://ai.daily.yangsir.net/daily/20260616-T0-04
issue_date: 2026-06-16
publish_date: 2026-06-15T19:17:30.000Z
category: research
source_name: "GitHub Blog"
source_url: https://github.blog/ai-and-ml/llms/accelerating-researchers-and-developers-building-multilingual-ai-with-a-new-open-dataset/
---

# GitHub发布多语言开发者数据集，助力AI研究

GitHub在Apache 2.0许可下发布多语言开发者数据集，涵盖README、issues和PR中的跨语言内容。该数据集包含50多种编程语言和自然语言，帮助开发者训练更通用的AI模型，促进全球技术协作。

## English Version

**GitHub Launches Multilingual Dataset for AI Research**

GitHub released a multilingual dataset under Apache 2.0, covering cross-language content in READMEs, issues, and PRs. It includes 50+ programming and natural languages to help developers train more versatile AI models.

---

**来源**：[GitHub Blog](https://github.blog/ai-and-ml/llms/accelerating-researchers-and-developers-building-multilingual-ai-with-a-new-open-dataset/)

**详情页**：https://ai.daily.yangsir.net/daily/20260616-T0-04

---

*智语观潮 · Daily — https://ai.daily.yangsir.net/llms.txt*