🤖 AI Summary
Current短视频 content moderation relies heavily on unimodal adversarial attacks for evaluating multimodal large language models (MLLMs), failing to expose robustness deficiencies arising from joint visual, auditory, and semantic understanding. This work proposes ChimeraBreak—the first trimodal co-adversarial attack framework tailored for short videos—alongside SVMA, the first multimodal adversarial dataset for short-video safety evaluation. We introduce a human-guided synthesis strategy and an LLM-as-a-judge automated assessment mechanism to rigorously quantify model failures. Experiments demonstrate high attack success rates across mainstream MLLMs and systematically uncover diverse cross-modal misjudgment failure patterns. Our study not only reveals critical safety vulnerabilities of state-of-the-art MLLMs in real-world content moderation scenarios but also establishes a reproducible benchmark and empirically grounded improvement pathway for enhancing the robustness of multimodal safety systems.
📝 Abstract
Multimodal Large Language Models (MLLMs) are increasingly used for content moderation, yet their robustness in short-form video contexts remains underexplored. Current safety evaluations often rely on unimodal attacks, failing to address combined attack vulnerabilities. In this paper, we introduce a comprehensive framework for evaluating the tri-modal safety of MLLMs. First, we present the Short-Video Multimodal Adversarial (SVMA) dataset, comprising diverse short-form videos with human-guided synthetic adversarial attacks. Second, we propose ChimeraBreak, a novel tri-modal attack strategy that simultaneously challenges visual, auditory, and semantic reasoning pathways. Extensive experiments on state-of-the-art MLLMs reveal significant vulnerabilities with high Attack Success Rates (ASR). Our findings uncover distinct failure modes, showing model biases toward misclassifying benign or policy-violating content. We assess results using LLM-as-a-judge, demonstrating attack reasoning efficacy. Our dataset and findings provide crucial insights for developing more robust and safe MLLMs.