🤖 AI Summary
This work addresses the lack of systematic research and benchmarking in query performance prediction (QPP) for content-based video retrieval (CBVR). We introduce VQPP, the first benchmark for video QPP, comprising two text-to-video datasets, two CBVR systems, 56K queries, and 51K videos, with dedicated training, validation, and test splits to support evaluation of both pre-retrieval and post-retrieval QPP methods. For the first time in video retrieval, we validate the effectiveness of pre-retrieval predictors and leverage them as reward signals to train large language models for query rewriting via Direct Preference Optimization (DPO). Experimental results demonstrate that our pre-retrieval predictor achieves strong performance and significantly enhances retrieval effectiveness, thereby advancing reproducible research and optimization in video retrieval systems.
📝 Abstract
Query performance prediction (QPP) is an important and actively studied information retrieval task, having various applications, such as query reformulation, query expansion, and retrieval system selection, among many others. The task has been primarily studied in the context of text and image retrieval, whereas QPP for content-based video retrieval (CBVR) remains largely underexplored. To this end, we propose the first benchmark for video query performance prediction (VQPP), comprising two text-to-video retrieval datasets and two CBVR systems, respectively. VQPP contains a total of 56K text queries and 51K videos, and comes with official training, validation and test splits, fostering direct comparisons and reproducible results. We explore multiple pre-retrieval and post-retrieval performance predictors, creating a representative benchmark for future exploration of QPP in the video domain. Our results show that pre-retrieval predictors obtain competitive performance, enabling applications before performing the retrieval step. We also demonstrate the applicability of VQPP by employing the best performing pre-retrieval predictor as reward model for training a large language model (LLM) on the query reformulation task via direct preference optimization (DPO). We release our benchmark and code at https://github.com/AdrianLutu/VQPP.