ADVEDM:Fine-grained Adversarial Attack against VLM-based Embodied Agents

📅 2025-09-20
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing vision-language model (VLM)-based embodied agents face two critical limitations in adversarial attacks: unrealistic strong white-box assumptions and semantically inconsistent global image perturbations that disrupt reasoning and render actions inexecutable. This work proposes a fine-grained adversarial attack framework that selectively manipulates only task-relevant object semantics—via either removal or addition—while preserving background and contextual semantics intact. The attack induces models to generate syntactically valid and physically executable yet decisionally erroneous outputs. Our method explicitly incorporates the VLM’s perception–reasoning pipeline and is designed for both black-box and gray-box settings. Experiments demonstrate substantial accuracy degradation across diverse vision-language understanding and embodied decision-making tasks, while maintaining high imperceptibility and strong cross-model transferability.

Technology Category

Application Category

📝 Abstract
Vision-Language Models (VLMs), with their strong reasoning and planning capabilities, are widely used in embodied decision-making (EDM) tasks in embodied agents, such as autonomous driving and robotic manipulation. Recent research has increasingly explored adversarial attacks on VLMs to reveal their vulnerabilities. However, these attacks either rely on overly strong assumptions, requiring full knowledge of the victim VLM, which is impractical for attacking VLM-based agents, or exhibit limited effectiveness. The latter stems from disrupting most semantic information in the image, which leads to a misalignment between the perception and the task context defined by system prompts. This inconsistency interrupts the VLM's reasoning process, resulting in invalid outputs that fail to affect interactions in the physical world. To this end, we propose a fine-grained adversarial attack framework, ADVEDM, which modifies the VLM's perception of only a few key objects while preserving the semantics of the remaining regions. This attack effectively reduces conflicts with the task context, making VLMs output valid but incorrect decisions and affecting the actions of agents, thus posing a more substantial safety threat in the physical world. We design two variants of based on this framework, ADVEDM-R and ADVEDM-A, which respectively remove the semantics of a specific object from the image and add the semantics of a new object into the image. The experimental results in both general scenarios and EDM tasks demonstrate fine-grained control and excellent attack performance.
Problem

Research questions and friction points this paper is trying to address.

Existing adversarial attacks on VLMs rely on impractical strong assumptions
Current attacks disrupt most semantic information causing task misalignment
Limited attack effectiveness fails to produce valid but incorrect decisions
Innovation

Methods, ideas, or system contributions that make the work stand out.

Fine-grained adversarial attack modifies key objects
Preserves semantics of non-targeted image regions
Generates valid but incorrect agent decisions
🔎 Similar Papers
No similar papers found.
Y
Yichen Wang
National Engineering Research Center for Big Data Technology and System, Services Computing Technology and System Lab, Hubei Engineering Research Center on Big Data Security, Hubei Key Laboratory of Distributed System Security, School of Cyber Science and Engineering, Huazhong University of Science and Technology
Hangtao Zhang
Hangtao Zhang
Huazhong University of Science and Technology (HUST)
AI Security
Hewen Pan
Hewen Pan
Huazhong University of Science and Technology
MLLMsAI Security & Safety
Z
Ziqi Zhou
National Engineering Research Center for Big Data Technology and System, Services Computing Technology and System Lab, Cluster and Grid Computing Lab, Hubei Engineering Research Center on Big Data Security, Hubei Key Laboratory of Distributed System Security, School of Cyber Science and Engineering, Huazhong University of Science and Technology
Xianlong Wang
Xianlong Wang
Ph.D. student, City University of Hong Kong
Trustworthy LLM/VLMEmbodied AIUnlearnable Example3D Point CloudPoisoning/Adversarial Attack
P
Peijin Guo
National Engineering Research Center for Big Data Technology and System, Services Computing Technology and System Lab, Hubei Engineering Research Center on Big Data Security, Hubei Key Laboratory of Distributed System Security, School of Cyber Science and Engineering, Huazhong University of Science and Technology
L
Lulu Xue
National Engineering Research Center for Big Data Technology and System, Services Computing Technology and System Lab, Hubei Engineering Research Center on Big Data Security, Hubei Key Laboratory of Distributed System Security, School of Cyber Science and Engineering, Huazhong University of Science and Technology
Shengshan Hu
Shengshan Hu
School of CSE, Huazhong University of Science and Technology (HUST)
AI SecurityEmbodied AIAutonomous Driving
Minghui Li
Minghui Li
Huazhong University of Science and Technology
AI Security
L
Leo Yu Zhang
School of Information and Communication Technology, Griffith University