Robust Relevance Feedback for Interactive Known-Item Video Search

📅 2025-05-21

📈 Citations: 0

✨ Influential: 0

career value

236K/year

🤖 AI Summary

In known-item video search (KIS), conventional relevance feedback is ineffective due to the presence of only a single target, while embedding-based representations lack interpretability, exacerbating user–machine perceptual misalignment. To address this, we propose a robust interactive feedback mechanism. Our key contributions are: (1) the first relative pairwise judgment–based feedback paradigm for KIS; (2) a learnable multi-sub-perceptual-space embedding decomposition framework that dynamically identifies and suppresses user perceptual mismatches; and (3) a feedback-driven predictive user modeling and joint ranking optimization method. Evaluated on the large-scale V3C dataset, our approach elevates over 60% of targets initially ranked between positions 10–50 to top-1, and achieves over 40% optimization success for targets originally ranked 1,000–5,000. These results demonstrate substantial improvements in feedback reliability and intent modeling stability within KIS scenarios.

Technology Category

Application Category

📝 Abstract

Known-item search (KIS) involves only a single search target, making relevance feedback-typically a powerful technique for efficiently identifying multiple positive examples to infer user intent-inapplicable. PicHunter addresses this issue by asking users to select the top-k most similar examples to the unique search target from a displayed set. Under ideal conditions, when the user's perception aligns closely with the machine's perception of similarity, consistent and precise judgments can elevate the target to the top position within a few iterations. However, in practical scenarios, expecting users to provide consistent judgments is often unrealistic, especially when the underlying embedding features used for similarity measurements lack interpretability. To enhance robustness, we first introduce a pairwise relative judgment feedback that improves the stability of top-k selections by mitigating the impact of misaligned feedback. Then, we decompose user perception into multiple sub-perceptions, each represented as an independent embedding space. This approach assumes that users may not consistently align with a single representation but are more likely to align with one or several among multiple representations. We develop a predictive user model that estimates the combination of sub-perceptions based on each user feedback instance. The predictive user model is then trained to filter out the misaligned sub-perceptions. Experimental evaluations on the large-scale open-domain dataset V3C indicate that the proposed model can optimize over 60% search targets to the top rank when their initial ranks at the search depth between 10 and 50. Even for targets initially ranked between 1,000 and 5,000, the model achieves a success rate exceeding 40% in optimizing ranks to the top, demonstrating the enhanced robustness of relevance feedback in KIS despite inconsistent feedback.

Problem

Research questions and friction points this paper is trying to address.

Enhancing relevance feedback robustness for known-item video search

Addressing inconsistent user judgments in similarity-based feedback

Decomposing user perception into multiple interpretable embedding spaces

Innovation

Methods, ideas, or system contributions that make the work stand out.

Pairwise relative judgment feedback enhances stability

Decompose user perception into multiple sub-perceptions

Predictive user model filters misaligned sub-perceptions

🔎 Similar Papers

No similar papers found.