ReasonRank: Empowering Passage Ranking with Strong Reasoning Ability

📅 2025-08-09
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing LLM-based re-ranking models underperform in complex ranking scenarios due to insufficient reasoning-intensive training data. Method: We propose ReasonRank—a novel framework integrating automated reasoning data synthesis, self-consistency filtering, and multi-view ranking reward reinforcement learning. Specifically, DeepSeek-R1 is leveraged to generate high-quality reasoning-aware ranking labels; the model is then trained via two stages—supervised fine-tuning followed by reinforcement learning with a multi-view ranking reward that jointly optimizes ranking consistency and logical coherence. Contribution/Results: ReasonRank achieves a new state-of-the-art score of 40.6 on the BRIGHT leaderboard, significantly outperforming prior methods, while maintaining lower inference latency than the pointwise model Rank1. Its core contribution is the first reasoning-enhanced training paradigm specifically designed for paragraph re-ranking, systematically improving both logical reasoning capability and ranking coordination.

Technology Category

Application Category

📝 Abstract
Large Language Model (LLM) based listwise ranking has shown superior performance in many passage ranking tasks. With the development of Large Reasoning Models, many studies have demonstrated that step-by-step reasoning during test-time helps improve listwise ranking performance. However, due to the scarcity of reasoning-intensive training data, existing rerankers perform poorly in many complex ranking scenarios and the ranking ability of reasoning-intensive rerankers remains largely underdeveloped. In this paper, we first propose an automated reasoning-intensive training data synthesis framework, which sources training queries and passages from diverse domains and applies DeepSeek-R1 to generate high-quality training labels. A self-consistency data filtering mechanism is designed to ensure the data quality. To empower the listwise reranker with strong reasoning ability, we further propose a two-stage post-training approach, which includes a cold-start supervised fine-tuning (SFT) stage for reasoning pattern learning and a reinforcement learning (RL) stage for further ranking ability enhancement. During the RL stage, based on the nature of listwise ranking, we design a multi-view ranking reward, which is more effective than a ranking metric-based reward. Extensive experiments demonstrate that our trained reasoning-intensive reranker extbf{ReasonRank} outperforms existing baselines significantly and also achieves much lower latency than pointwise reranker Rank1. extbf{Through further experiments, our ReasonRank has achieved state-of-the-art (SOTA) performance 40.6 on the BRIGHT leaderboardfootnote{https://brightbenchmark.github.io/}.} Our codes are available at https://github.com/8421BCD/ReasonRank.
Problem

Research questions and friction points this paper is trying to address.

Addresses scarcity of reasoning-intensive training data for rerankers
Improves reasoning ability in complex passage ranking scenarios
Reduces latency while enhancing ranking performance
Innovation

Methods, ideas, or system contributions that make the work stand out.

Automated reasoning-intensive training data synthesis
Two-stage post-training with SFT and RL
Multi-view ranking reward for RL enhancement
🔎 Similar Papers
No similar papers found.