Adversarial Attacks Already Tell the Answer: Directional Bias-Guided Test-time Defense for Vision-Language Models

📅 2026-06-04
📈 Citations: 0
Influential: 0
📄 PDF

career value

180K/year
🤖 AI Summary
Vision-language models such as CLIP are highly vulnerable to adversarial perturbations, necessitating effective test-time defenses that do not require retraining. This work reveals that adversarial examples exhibit consistent shifts along a dominant direction in the feature space, which implicitly encodes the true decision boundary—a finding reported for the first time. Building on this insight, the authors propose Directional Bias-guided Defense (DBD), a framework that estimates a “defense direction” and employs a DB-score-driven dual-stream reconstruction strategy to recover robust representations. Evaluated across 15 datasets, DBD achieves state-of-the-art adversarial robustness and, remarkably, even surpasses the clean accuracy in certain scenarios.
📝 Abstract
Vision-Language Models (VLMs), such as CLIP, have shown strong zero-shot generalization but remain highly vulnerable to adversarial perturbations, posing serious risks in real-world applications. Test-time defenses for VLMs have recently emerged as a promising and efficient approach to defend against adversarial attacks without requiring costly large-scale retraining. In this work, we uncover a surprising phenomenon: under diverse input transformations, adversarial images in CLIP's feature space consistently shift along a dominant direction, in contrast to the dispersed patterns of clean images. We hypothesize that this dominant shift, termed the Defense Direction, opposes the adversarial shift, pointing features back toward their correct class centers. Building on this insight, we propose Directional Bias-guided Defense (DBD), a test-time framework that estimates the Defense Direction and employs a DB-score-based two-stream reconstruction strategy to recover robust representations. Experiments on 15 datasets demonstrate that DBD not only achieves SOTA adversarial robustness while preserving clean accuracy, but also reveals the counterintuitive result that adversarial accuracy can even surpass clean accuracy. This demonstrates that adversarial perturbations inherently encode directional priors about the true decision boundary.
Problem

Research questions and friction points this paper is trying to address.

Vision-Language Models
Adversarial Attacks
Test-time Defense
Adversarial Robustness
CLIP
Innovation

Methods, ideas, or system contributions that make the work stand out.

Directional Bias
Test-time Defense
Vision-Language Models
Adversarial Robustness
Feature Space Shift