SS-TPT: Stability and Suitability-Guided Test-Time Prompt Tuning for Adversarially Robust Vision-Language Models

📅 2026-06-05
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the limited robustness of current vision-language models under adversarial perturbations and the high computational cost of existing test-time adaptation methods, which typically rely on numerous augmented views. The authors propose SS-TPT, a novel approach that introduces dual criteria—stability, defined as prediction invariance under weak augmentations, and suitability, measured by feature-space density—to dynamically evaluate and select high-quality augmented views. These selected views guide prompt tuning and weighted prediction, enabling significant improvements in model robustness and generalization while maintaining low inference overhead. Extensive experiments demonstrate that SS-TPT achieves superior trade-offs between robustness and throughput across multiple benchmarks.
📝 Abstract
Vision-language models (VLMs) such as CLIP achieve strong zero-shot recognition but remain highly fragile under adversarial perturbations. Recent test-time adaptation defenses improve robustness by leveraging many augmented views, but this leads to impractical slowdown and a clear robustness-throughput trade-off. To address this challenge, we present Stability and Suitability-guided Test-time Prompt Tuning (SS-TPT), evaluating the quality of each augmented view via two complementary scores: (1) stability, measuring prediction invariance to weak augmentations, and (2) suitability, measuring feature-space density among views. These stability and suitability (SS) scores guide both adaptation and inference through an SS-guided consistency loss and an SS-weighted prediction, amplifying trustworthy views while suppressing corrupted ones. Extensive experiments demonstrate that SS-TPT significantly outperforms prior state-of-the-art methods, achieving superior robustness-throughput trade-offs across diverse datasets and varying numbers of views, thereby demonstrating both strong practicality and generality. Our code is available at https://github.com/sunoh-kim/SS-TPT.
Problem

Research questions and friction points this paper is trying to address.

adversarial robustness
vision-language models
test-time adaptation
robustness-throughput trade-off
adversarial perturbations
Innovation

Methods, ideas, or system contributions that make the work stand out.

test-time adaptation
prompt tuning
adversarial robustness
vision-language models
robustness-throughput trade-off
🔎 Similar Papers
No similar papers found.