VLMShield: Efficient and Robust Defense of Vision-Language Models against Malicious Prompts

📅 2026-04-07
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the vulnerability of vision-language models (VLMs) to malicious prompt attacks, a critical security concern exacerbated by the limitations of existing defenses in both efficiency and robustness. To tackle this challenge, the authors propose the Multimodal Aggregated Feature Extraction (MAFE) framework, which extends CLIP to effectively handle long textual inputs and integrate multimodal information. Leveraging MAFE, they uncover—for the first time—the distinct distributional differences between benign and malicious prompts in the feature space. Building upon this insight, they design VLMShield, a lightweight, plug-and-play security detector that significantly enhances detection efficiency and robustness without compromising the original model performance. VLMShield supports flexible deployment across diverse scenarios and consistently outperforms current state-of-the-art methods.
📝 Abstract
Vision-Language Models (VLMs) face significant safety vulnerabilities from malicious prompt attacks due to weakened alignment during visual integration. Existing defenses suffer from efficiency and robustness. To address these challenges, we first propose the Multimodal Aggregated Feature Extraction (MAFE) framework that enables CLIP to handle long text and fuse multimodal information into unified representations. Through empirical analysis of MAFE-extracted features, we discover distinct distributional patterns between benign and malicious prompts. Building upon this finding, we develop VLMShield, a lightweight safety detector that efficiently identifies multimodal malicious attacks as a plug-and-play solution. Extensive experiments demonstrate superior performance across multiple dimensions, including robustness, efficiency, and utility. Through our work, we hope to pave the way for more secure multimodal AI deployment. Code is available at [this https URL](https://github.com/pgqihere/VLMShield).
Problem

Research questions and friction points this paper is trying to address.

Vision-Language Models
malicious prompts
safety vulnerabilities
robustness
efficiency
Innovation

Methods, ideas, or system contributions that make the work stand out.

VLMShield
Multimodal Aggregated Feature Extraction
malicious prompt defense
vision-language models
safety detector
🔎 Similar Papers
No similar papers found.
P
Peigui Qi
University of Science and Technology of China
K
Kunsheng Tang
University of Science and Technology of China
Y
Yanpu Yu
University of Science and Technology of China
J
Jialin Wu
Ant Group
Y
Yide Song
University of Washington
W
Wenbo Zhou
University of Science and Technology of China
Zhicong Huang
Zhicong Huang
Ant Group
CryptographySecurity and PrivacyMachine Learning
C
Cheng Hong
Ant Group
W
Weiming Zhang
University of Science and Technology of China
Nenghai Yu
Nenghai Yu
University of Science and Technology of China
Computer VisionArtificial IntelligenceInformation Hiding