Revisiting Query Variants: The Advantage of Retrieval Over Generation of Query Variants for Effective QPP

📅 2025-10-02
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing generative query variant (QV)-based query performance prediction (QPP) methods suffer from topic drift and hallucination. To address this, we propose a retrieval-based QV construction paradigm: first retrieving semantically similar historical queries from the training set as initial variants, then performing two-hop backward retrieval over their relevant documents to enhance recall while preserving semantic consistency. Unlike conventional approaches relying on embedding expansion or context-agnostic generation, our method introduces retrieval—rather than generation—as the core mechanism for QV construction, marking the first such application in QV generation. Evaluated on MS MARCO and TREC DL’19/20 with neural rankers (e.g., MonoT5), our approach improves QPP accuracy by approximately 20% over the best generative baseline, significantly mitigating topic drift and enhancing both prediction robustness and interpretability.

Technology Category

Application Category

📝 Abstract
Leveraging query variants (QVs), i.e., queries with potentially similar information needs to the target query, has been shown to improve the effectiveness of query performance prediction (QPP) approaches. Existing QV-based QPP methods generate QVs facilitated by either query expansion or non-contextual embeddings, which may introduce topical drifts and hallucinations. In this paper, we propose a method that retrieves QVs from a training set (e.g., MS MARCO) for a given target query of QPP. To achieve a high recall in retrieving queries with the most similar information needs as the target query from a training set, we extend the directly retrieved QVs (1-hop QVs) by a second retrieval using their denoted relevant documents (which yields 2-hop QVs). Our experiments, conducted on TREC DL'19 and DL'20, show that the QPP methods with QVs retrieved by our method outperform the best-performing existing generated-QV-based QPP approaches by as much as around 20%, on neural ranking models like MonoT5.
Problem

Research questions and friction points this paper is trying to address.

Retrieving query variants improves query performance prediction
Existing methods cause topical drifts and hallucinations
Proposing two-hop retrieval method for better variant selection
Innovation

Methods, ideas, or system contributions that make the work stand out.

Retrieves query variants from training set
Extends retrieval using relevant documents
Improves query performance prediction by 20%
🔎 Similar Papers
No similar papers found.