🤖 AI Summary
Existing red-teaming research largely overlooks visual inputs as an attack surface, resulting in insufficient defenses against Resource Consumption Attacks (RCAs) on Large Vision-Language Models (LVLMs).
Method: This work introduces the first vision-modality-oriented unbounded generation trigger mechanism, constructing cross-sample universal attack templates via fine-grained pixel-level adversarial perturbations, vision-guided optimization, and a multi-objective parallel loss function—requiring no textual prompt manipulation.
Contribution/Results: A malicious image alone suffices to induce persistent, uncontrolled model generation, increasing service latency by over 26× and elevating GPU utilization and memory consumption by an additional 20%. This reveals a novel multimodal security vulnerability in LVLM deployments and establishes the first reproducible visual RCA benchmark for LVLMs, providing both empirical evaluation infrastructure and a new defensive perspective for vision-language systems.
📝 Abstract
Resource Consumption Attacks (RCAs) have emerged as a significant threat to the deployment of Large Language Models (LLMs). With the integration of vision modalities, additional attack vectors exacerbate the risk of RCAs in large vision-language models (LVLMs). However, existing red-teaming studies have largely overlooked visual inputs as a potential attack surface, resulting in insufficient mitigation strategies against RCAs in LVLMs. To address this gap, we propose RECALLED ( extbf{RE}source extbf{C}onsumption extbf{A}ttack on extbf{L}arge Vision- extbf{L}anguag extbf{E} Mo extbf{D}els), the first approach for exploiting visual modalities to trigger unbounded RCAs red-teaming. First, we present extit{Vision Guided Optimization}, a fine-grained pixel-level optimization, to obtain extit{Output Recall} adversarial perturbations, which can induce repeating output. Then, we inject the perturbations into visual inputs, triggering unbounded generations to achieve the goal of RCAs. Additionally, we introduce extit{Multi-Objective Parallel Losses} to generate universal attack templates and resolve optimization conflicts when intending to implement parallel attacks. Empirical results demonstrate that RECALLED increases service response latency by over 26 $uparrow$, resulting in an additional 20% increase in GPU utilization and memory consumption. Our study exposes security vulnerabilities in LVLMs and establishes a red-teaming framework that can facilitate future defense development against RCAs.