DeAR: Dual-Stage Document Reranking with Reasoning Agents via LLM Distillation

📅 2025-08-23

📈 Citations: 0

✨ Influential: 0

career value

145K/year

🤖 AI Summary

To address the challenge of simultaneously achieving fine-grained scoring and global cross-document reasoning in document re-ranking, this paper proposes DeAR—a two-stage re-ranking framework based on large language model (LLM) distillation. In the first stage, multi-objective knowledge distillation—incorporating cross-entropy loss, RankNet loss, and KL divergence—is employed to transfer the LLaMA teacher model’s token-level relevance judgment capability to a lightweight student model. In the second stage, LoRA adapters are introduced to enable chain-of-thought reasoning at the list level, generating interpretable natural language rationales. Evaluated on standard benchmarks—including TREC-DL, BEIR, and NovelEval—DeAR consistently outperforms state-of-the-art open-source methods: it achieves a 5.1-percentage-point improvement in nDCG@5 on DL20, attains 90.97 on NovelEval, and reaches 54.29% Top-1 accuracy on Natural Questions—demonstrating substantial gains in both effectiveness and interpretability.

Technology Category

Application Category

📝 Abstract

Large Language Models (LLMs) have transformed listwise document reranking by enabling global reasoning over candidate sets, yet single models often struggle to balance fine-grained relevance scoring with holistic cross-document analysis. We propose extbf{De}ep extbf{A}gent extbf{R}ank ( extbf{DeAR}), an open-source framework that decouples these tasks through a dual-stage approach, achieving superior accuracy and interpretability. In emph{Stage 1}, we distill token-level relevance signals from a frozen 13B LLaMA teacher into a compact {3, 8}B student model using a hybrid of cross-entropy, RankNet, and KL divergence losses, ensuring robust pointwise scoring. In emph{Stage 2}, we attach a second LoRA adapter and fine-tune on 20K GPT-4o-generated chain-of-thought permutations, enabling listwise reasoning with natural-language justifications. Evaluated on TREC-DL19/20, eight BEIR datasets, and NovelEval-2306, DeAR surpasses open-source baselines by +5.1 nDCG@5 on DL20 and achieves 90.97 nDCG@10 on NovelEval, outperforming GPT-4 by +3.09. Without fine-tuning on Wikipedia, DeAR also excels in open-domain QA, achieving 54.29 Top-1 accuracy on Natural Questions, surpassing baselines like MonoT5, UPR, and RankGPT. Ablations confirm that dual-loss distillation ensures stable calibration, making DeAR a highly effective and interpretable solution for modern reranking systems.footnote{Dataset and code available at https://github.com/DataScienceUIBK/DeAR-Reranking.}.

Problem

Research questions and friction points this paper is trying to address.

Balancing fine-grained relevance scoring with holistic document analysis

Improving accuracy and interpretability in listwise document reranking

Enabling robust pointwise scoring and listwise reasoning capabilities

Innovation

Methods, ideas, or system contributions that make the work stand out.

Dual-stage reranking with reasoning agents

Token-level distillation from frozen teacher model

LoRA adapter for listwise reasoning with explanations

🔎 Similar Papers

ReasoningRank: Teaching Student Models to Rank through Reasoning-Based Knowledge Distillation