---
id: 20260424-T0-15
title: "OThink-SRR1：通过强化学习优化LLM多跳检索"
title_en: "OThink-SRR1: Boosts LLM Multi-hop Retrieval with Reinforced Learning"
url: https://ai.daily.yangsir.net/daily/20260424-T0-15
issue_date: 2026-04-24
publish_date: 2026-04-23T04:00:00.000Z
category: research
source_name: "arXiv cs.CL (NLP)"
source_url: https://arxiv.org/abs/2604.19766
---

# OThink-SRR1：通过强化学习优化LLM多跳检索

清华大学团队提出OThink-SRR1方法，通过强化学习动态优化检索策略，解决大模型在复杂多跳问题中的知识检索偏差。实验显示，该方法在HotPotQA数据集上的准确率较静态检索提升18%，且推理时间减少30%。该方案为RAG系统提供了新思路，尤其适用于需要跨多步骤推理的场景。

## English Version

**OThink-SRR1: Boosts LLM Multi-hop Retrieval with Reinforced Learning**

Tsinghua researchers developed OThink-SRR1, a reinforced learning approach that dynamically optimizes retrieval for LLMs in multi-hop problems. It achieves 18% higher accuracy on HotPotQA and reduces inference time by 30% compared to static methods. This offers a new direction for RAG systems requiring complex reasoning.

---

**来源**：[arXiv cs.CL (NLP)](https://arxiv.org/abs/2604.19766)

**详情页**：https://ai.daily.yangsir.net/daily/20260424-T0-15

---

*智语观潮 · Daily — https://ai.daily.yangsir.net/llms.txt*