🤖 AI Summary
To address the challenge of achieving robust semantic matching in complex corpora—where query rewriting alone proves insufficient—this paper proposes a bidirectional retrieval-augmented framework grounded in large language models (LLMs). The method jointly optimizes query rewriting and document understanding within a reinforcement learning paradigm, enabling cooperative enhancement of both queries and documents. It introduces a novel decoupled reward sampling strategy and a dedicated optimization algorithm to significantly mitigate reliance on manual priors and reduce human-induced bias. Experimental results demonstrate substantial performance gains across both sparse and dense retrieval settings, with particularly pronounced improvements on hard retrieval tasks. Moreover, the framework exhibits strong cross-benchmark generalization capability. The implementation is publicly available.
📝 Abstract
Recent studies have proposed leveraging Large Language Models (LLMs) as information retrievers through query rewriting. However, for challenging corpora, we argue that enhancing queries alone is insufficient for robust semantic matching; the LLM should also have sufficient understanding of the corpus by directly handling and augmenting the documents themselves. To this end, we present an LLM-based retriever empowered to augment both user queries and corpus documents, with its policy fully explored via reinforcement learning (RL) and minimal human inductive bias. Notably, we find that simply allowing the LLM to modify documents yields little benefit unless paired with our carefully designed bidirectional RL framework, which enables the LLM to simultaneously learn and collaborate on both query and document augmentation policies. A key technical challenge in realizing such a framework lies in jointly updating both policies during training, where the rewards for the two directions depend on each other, making their entangled reward intractable. Our approach addresses this by introducing a reward sampling strategy and a specifically designed RL algorithm that enables effective training with these sampled rewards. Experimental results demonstrate that our approach significantly enhances LLM-based retrieval performance in both sparse and dense settings, particularly in difficult retrieval domains, and achieves strong cross-benchmark generalization. Our code is released at https://github.com/liujm2001/CoAugRetriever.