Harnessing the Power of Reinforcement Learning for Language-Model-Based Information Retriever via Query-Document Co-Augmentation

📅 2025-06-23
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the challenge of achieving robust semantic matching in complex corpora—where query rewriting alone proves insufficient—this paper proposes a bidirectional retrieval-augmented framework grounded in large language models (LLMs). The method jointly optimizes query rewriting and document understanding within a reinforcement learning paradigm, enabling cooperative enhancement of both queries and documents. It introduces a novel decoupled reward sampling strategy and a dedicated optimization algorithm to significantly mitigate reliance on manual priors and reduce human-induced bias. Experimental results demonstrate substantial performance gains across both sparse and dense retrieval settings, with particularly pronounced improvements on hard retrieval tasks. Moreover, the framework exhibits strong cross-benchmark generalization capability. The implementation is publicly available.

Technology Category

Application Category

📝 Abstract
Recent studies have proposed leveraging Large Language Models (LLMs) as information retrievers through query rewriting. However, for challenging corpora, we argue that enhancing queries alone is insufficient for robust semantic matching; the LLM should also have sufficient understanding of the corpus by directly handling and augmenting the documents themselves. To this end, we present an LLM-based retriever empowered to augment both user queries and corpus documents, with its policy fully explored via reinforcement learning (RL) and minimal human inductive bias. Notably, we find that simply allowing the LLM to modify documents yields little benefit unless paired with our carefully designed bidirectional RL framework, which enables the LLM to simultaneously learn and collaborate on both query and document augmentation policies. A key technical challenge in realizing such a framework lies in jointly updating both policies during training, where the rewards for the two directions depend on each other, making their entangled reward intractable. Our approach addresses this by introducing a reward sampling strategy and a specifically designed RL algorithm that enables effective training with these sampled rewards. Experimental results demonstrate that our approach significantly enhances LLM-based retrieval performance in both sparse and dense settings, particularly in difficult retrieval domains, and achieves strong cross-benchmark generalization. Our code is released at https://github.com/liujm2001/CoAugRetriever.
Problem

Research questions and friction points this paper is trying to address.

Enhancing query-document semantic matching via bidirectional augmentation
Training LLM retriever with reinforcement learning for joint policy optimization
Overcoming entangled rewards in query-document co-augmentation framework
Innovation

Methods, ideas, or system contributions that make the work stand out.

Co-augmenting queries and documents via LLM
Bidirectional RL framework for policy learning
Reward sampling strategy for entangled rewards
🔎 Similar Papers
No similar papers found.
J
Jingming Liu
State Key Laboratory of CAD&CG, Zhejiang University
Y
Yumeng Li
State Key Laboratory of CAD&CG, Zhejiang University
W
Wei Shi
Meituan Inc.
Yao-Xiang Ding
Yao-Xiang Ding
Assistant Professor, Zhejiang University
machine learning
H
Hui Su
Meituan Inc.
K
Kun Zhou
State Key Laboratory of CAD&CG, Zhejiang University