---
id: 20260302-T0-14
title: "截断步级采样与过程奖励用于检索增强推理"
title_en: "Truncated Step Sampling for RAG with Process Rewards"
url: https://ai.daily.yangsir.net/daily/20260302-T0-14
issue_date: 2026-03-02
publish_date: 2026-03-02T05:00:00.000Z
source_name: "arXiv cs.CL (NLP)"
source_url: https://arxiv.org/abs/2602.23440
---

# 截断步级采样与过程奖励用于检索增强推理

该研究提出了一种基于截断步级采样的检索增强推理方法。通过在多步轨迹中引入过程奖励机制，解决了传统强化学习中信用分配问题。实验显示，该方法将推理延迟降低40%，同时保持与Search-R1相当的准确性。适用于需要实时反馈的复杂推理任务，如搜索引擎交互式问答。

## English Version

**Truncated Step Sampling for RAG with Process Rewards**

This research introduces a retrieval-augmented reasoning method using truncated step sampling with process rewards. It solves the credit assignment problem in traditional reinforcement learning by introducing process rewards in multi-step trajectories. Experiments show this method reduces reasoning latency by 40% while maintaining accuracy comparable to Search-R1. It applies to complex reasoning tasks requiring real-time feedback, like interactive search engine Q&A.

---

**来源**：[arXiv cs.CL (NLP)](https://arxiv.org/abs/2602.23440)

**详情页**：https://ai.daily.yangsir.net/daily/20260302-T0-14

---

*智语观潮 · Daily — https://ai.daily.yangsir.net/llms.txt*