VALD: Multi-Stage Vision Attack Detection for Efficient LVLM Defense

📅 2026-02-23

📈 Citations: 0

✨ Influential: 0

career value

220K/year

🤖 AI Summary

This work addresses the vulnerability of large vision-language models (LVLMs) to adversarial image attacks, which can induce plausible yet incorrect outputs. To counter this, the authors propose an efficient, training-free defense framework that first applies low-cost, content-preserving image transformations to rapidly identify and pass through clean samples, thereby avoiding redundant computation. For suspicious inputs, the method leverages discrepancy analysis in the text embedding space combined with collaborative reasoning from a large language model (LLM) to enable precise detection of adversarial perturbations. Correct model behavior is then restored through multi-response aggregation. The approach maintains high accuracy while significantly reducing computational overhead: most clean samples undergo only lightweight processing, and even under heavy adversarial load, it sustains extremely low latency, effectively balancing efficiency and robustness.

Technology Category

Application Category

📝 Abstract

Large Vision-Language Models (LVLMs) can be vulnerable to adversarial images that subtly bias their outputs toward plausible yet incorrect responses. We introduce a general, efficient, and training-free defense that combines image transformations with agentic data consolidation to recover correct model behavior. A key component of our approach is a two-stage detection mechanism that quickly filters out the majority of clean inputs. We first assess image consistency under content-preserving transformations at negligible computational cost. For more challenging cases, we examine discrepancies in a text-embedding space. Only when necessary do we invoke a powerful LLM to resolve attack-induced divergences. A key idea is to consolidate multiple responses, leveraging both their similarities and their differences. We show that our method achieves state-of-the-art accuracy while maintaining notable efficiency: most clean images skip costly processing, and even in the presence of numerous adversarial examples, the overhead remains minimal.

Problem

Research questions and friction points this paper is trying to address.

adversarial images

Vision-Language Models

attack detection

model defense

adversarial robustness

Innovation

Methods, ideas, or system contributions that make the work stand out.

adversarial attack detection

vision-language models

training-free defense