PDA: Text-Augmented Defense Framework for Robust Vision-Language Models against Adversarial Image Attacks

📅 2026-04-01
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Vision-language models are highly vulnerable to adversarial image perturbations, yet existing defense methods rely on computationally expensive adversarial training and exhibit limited generalization. This work proposes PDA, a training-free, test-time defense framework that enhances robustness without modifying the original model by integrating textual augmentation strategies—namely prompt rewriting, question decomposition, and consistency-based answer aggregation—during inference. To control computational overhead, PDA employs lightweight invariant instances and is readily applicable across diverse vision-language tasks, including visual question answering, classification, and image captioning. The approach substantially improves resistance against multiple types of adversarial attacks while preserving high accuracy on clean samples, achieving an efficient, general-purpose, and strongly robust defense.
📝 Abstract
Vision-language models (VLMs) are vulnerable to adversarial image perturbations. Existing works based on adversarial training against task-specific adversarial examples are computationally expensive and often fail to generalize to unseen attack types. To address these limitations, we introduce Paraphrase-Decomposition-Aggregation (PDA), a training-free defense framework that leverages text augmentation to enhance VLM robustness under diverse adversarial image attacks. PDA performs prompt paraphrasing, question decomposition, and consistency aggregation entirely at test time, thus requiring no modification on the underlying models. To balance robustness and efficiency, we instantiate PDA as invariants that reduce the inference cost while retaining most of its robustness gains. Experiments on multiple VLM architectures and benchmarks for visual question answering, classification, and captioning show that PDA achieves consistent robustness gains against various adversarial perturbations while maintaining competitive clean accuracy, establishing a generic, strong and practical defense framework for VLMs during inference.
Problem

Research questions and friction points this paper is trying to address.

vision-language models
adversarial image attacks
robustness
generalization
defense
Innovation

Methods, ideas, or system contributions that make the work stand out.

text augmentation
adversarial defense
vision-language models
training-free
prompt paraphrasing
J
Jingning Xu
Department of Computer Science, City University of Hong Kong
Haochen Luo
Haochen Luo
City University of Hong Kong
Chen Liu
Chen Liu
City University of Hong Kong
Machine LearningOptimization