VLMShield: Efficient and Robust Defense of Vision-Language Models against Malicious Prompts

📅 2026-04-07

📈 Citations: 0

✨ Influential: 0

career value

202K/year

🤖 AI Summary

This work addresses the vulnerability of vision-language models (VLMs) to malicious prompt attacks, a critical security concern exacerbated by the limitations of existing defenses in both efficiency and robustness. To tackle this challenge, the authors propose the Multimodal Aggregated Feature Extraction (MAFE) framework, which extends CLIP to effectively handle long textual inputs and integrate multimodal information. Leveraging MAFE, they uncover—for the first time—the distinct distributional differences between benign and malicious prompts in the feature space. Building upon this insight, they design VLMShield, a lightweight, plug-and-play security detector that significantly enhances detection efficiency and robustness without compromising the original model performance. VLMShield supports flexible deployment across diverse scenarios and consistently outperforms current state-of-the-art methods.

Technology Category

Application Category

📝 Abstract

Vision-Language Models (VLMs) face significant safety vulnerabilities from malicious prompt attacks due to weakened alignment during visual integration. Existing defenses suffer from efficiency and robustness. To address these challenges, we first propose the Multimodal Aggregated Feature Extraction (MAFE) framework that enables CLIP to handle long text and fuse multimodal information into unified representations. Through empirical analysis of MAFE-extracted features, we discover distinct distributional patterns between benign and malicious prompts. Building upon this finding, we develop VLMShield, a lightweight safety detector that efficiently identifies multimodal malicious attacks as a plug-and-play solution. Extensive experiments demonstrate superior performance across multiple dimensions, including robustness, efficiency, and utility. Through our work, we hope to pave the way for more secure multimodal AI deployment. Code is available at [this https URL](https://github.com/pgqihere/VLMShield).

Problem

Research questions and friction points this paper is trying to address.

Vision-Language Models

malicious prompts

safety vulnerabilities

robustness

efficiency

Innovation

Methods, ideas, or system contributions that make the work stand out.

VLMShield

Multimodal Aggregated Feature Extraction

malicious prompt defense