Toward Robust and Harmonious Adaptation for Cross-modal Retrieval

📅 2025-11-18
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
In real-world cross-modal retrieval, target-domain queries emerge dynamically—online and diversely—violating the conventional assumption of full target-data availability and inducing query shift, comprising both online and diversity shifts. This paper is the first to systematically model this dual-shift phenomenon and proposes a robust adaptive framework: (1) a retrieval-result refinement mechanism for query prediction; (2) a shift-robust objective function explicitly mitigating both shift types; and (3) a gradient-decoupling module that jointly optimizes source-domain knowledge retention and target-domain customization. Evaluated across 20 benchmarks spanning three cross-modal tasks, our method significantly improves robustness and generalization while effectively alleviating catastrophic forgetting. It establishes a novel paradigm for cross-modal retrieval under dynamic query scenarios.

Technology Category

Application Category

📝 Abstract
Recently, the general-to-customized paradigm has emerged as the dominant approach for Cross-Modal Retrieval (CMR), which reconciles the distribution shift problem between the source domain and the target domain. However, existing general-to-customized CMR methods typically assume that the entire target-domain data is available, which is easily violated in real-world scenarios and thus inevitably suffer from the query shift (QS) problem. Specifically, query shift embraces the following two characteristics and thus poses new challenges to CMR. i) Online Shift: real-world queries always arrive in an online manner, rendering it impractical to access the entire query set beforehand for customization approaches; ii) Diverse Shift: even with domain customization, the CMR models struggle to satisfy queries from diverse users or scenarios, leaving an urgent need to accommodate diverse queries. In this paper, we observe that QS would not only undermine the well-structured common space inherited from the source model, but also steer the model toward forgetting the indispensable general knowledge for CMR. Inspired by the observations, we propose a novel method for achieving online and harmonious adaptation against QS, dubbed Robust adaptation with quEry ShifT (REST). To deal with online shift, REST first refines the retrieval results to formulate the query predictions and accordingly designs a QS-robust objective function on these predictions to preserve the well-established common space in an online manner. As for tackling the more challenging diverse shift, REST employs a gradient decoupling module to dexterously manipulate the gradients during the adaptation process, thus preventing the CMR model from forgetting the general knowledge. Extensive experiments on 20 benchmarks across three CMR tasks verify the effectiveness of our method against QS.
Problem

Research questions and friction points this paper is trying to address.

Addresses query shift in cross-modal retrieval with online data arrival
Solves diverse user and scenario adaptation challenges in retrieval models
Prevents model forgetting of general knowledge during domain customization
Innovation

Methods, ideas, or system contributions that make the work stand out.

Online adaptation with query shift robust objective
Gradient decoupling to prevent general knowledge forgetting
Harmonious model updates for diverse user scenarios
🔎 Similar Papers
No similar papers found.
H
Haobin Li
College of Computer Science, Sichuan University, China
Mouxing Yang
Mouxing Yang
Sichuan University
Multi-modalMulti-viewNoisy Correspondence
X
Xi Peng
College of Computer Science, Sichuan University, China; National Key Laboratory of Fundamental Algorithms and Models for Engineering Numerical Simulation, Sichuan University, China