Dropout Prompt Learning: Towards Robust and Adaptive Vision-Language Models

📅 2025-12-08
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the limited robustness and adaptability of vision-language models (VLMs) in few-shot learning, long-tailed classification, and out-of-distribution generalization, this paper proposes **Dropout Prompt Learning**: a token-level dynamic dropout mechanism applied to both textual and visual branches, guided by cross-modal alignment; token importance is jointly estimated via intra-modal contextual modeling and cross-modal similarity assessment. Furthermore, we introduce **residual entropy regularization**, which enhances representation diversity while preserving semantic consistency. The method introduces no additional parameters and is architecture-agnostic, seamlessly integrating with mainstream VLMs. Extensive evaluation across 15 benchmarks demonstrates consistent and significant improvements across all three challenging scenarios: notably, novel-class recognition accuracy improves by 5.10% over KgCoOp and 2.13% over PromptSRC.

Technology Category

Application Category

📝 Abstract
Dropout is a widely used regularization technique which improves the generalization ability of a model by randomly dropping neurons. In light of this, we propose Dropout Prompt Learning, which aims for applying dropout to improve the robustness of the vision-language models. Different from the vanilla dropout, we apply dropout on the tokens of the textual and visual branches, where we evaluate the token significance considering both intra-modal context and inter-modal alignment, enabling flexible dropout probabilities for each token. Moreover, to maintain semantic alignment for general knowledge transfer while encouraging the diverse representations that dropout introduces, we further propose residual entropy regularization. Experiments on 15 benchmarks show our method's effectiveness in challenging scenarios like low-shot learning, long-tail classification, and out-of-distribution generalization. Notably, our method surpasses regularization-based methods including KgCoOp by 5.10% and PromptSRC by 2.13% in performance on base-to-novel generalization.
Problem

Research questions and friction points this paper is trying to address.

Improves vision-language model robustness via token-level dropout.
Enhances generalization in low-shot and out-of-distribution scenarios.
Maintains semantic alignment while encouraging diverse representations.
Innovation

Methods, ideas, or system contributions that make the work stand out.

Dropout applied to textual and visual tokens
Token significance evaluated via intra-modal and inter-modal context
Residual entropy regularization maintains alignment and diversity
🔎 Similar Papers
No similar papers found.
Biao Chen
Biao Chen
Syracuse University
L
Lin Zuo
School of Information and Software Engineering, University of Electronic Science and Technology of China
Mengmeng Jing
Mengmeng Jing
University of Electronic Science and Technology of China
Machine LearningComputer VisionMultimedia
K
Kunbin He
School of Information and Software Engineering, University of Electronic Science and Technology of China
Y
Yuchen Wang
School of Information and Software Engineering, University of Electronic Science and Technology of China