VEIL: Jailbreaking Text-to-Video Models via Visual Exploitation from Implicit Language

📅 2025-11-17
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work exposes a critical security vulnerability in text-to-video (T2V) models at the implicit semantic level: seemingly neutral prompts—containing rich cross-modal visual association cues—can bypass content safety filters and generate policy-violating, semantically unsafe videos. To address this, we propose the first modular prompt attack framework, integrating neutral scene anchors, latent auditory triggers, and style modulators to explicitly encode audio-visual co-occurrence priors and steer cross-modal associative generation. We further design constrained optimization and guided search strategies to efficiently discover highly stealthy adversarial prompts within the modular prompt space. Evaluated on seven mainstream T2V models—including multiple commercial systems—our approach achieves an average 23% improvement in attack success rate. This is the first systematic demonstration of T2V models’ susceptibility to implicit semantic attacks, revealing fundamental weaknesses in current safety mechanisms.

Technology Category

Application Category

📝 Abstract
Jailbreak attacks can circumvent model safety guardrails and reveal critical blind spots. Prior attacks on text-to-video (T2V) models typically add adversarial perturbations to obviously unsafe prompts, which are often easy to detect and defend. In contrast, we show that benign-looking prompts containing rich, implicit cues can induce T2V models to generate semantically unsafe videos that both violate policy and preserve the original (blocked) intent. To realize this, we propose VEIL, a jailbreak framework that leverages T2V models' cross-modal associative patterns via a modular prompt design. Specifically, our prompts combine three components: neutral scene anchors, which provide the surface-level scene description extracted from the blocked intent to maintain plausibility; latent auditory triggers, textual descriptions of innocuous-sounding audio events (e.g., creaking, muffled noises) that exploit learned audio-visual co-occurrence priors to bias the model toward particular unsafe visual concepts; and stylistic modulators, cinematic directives (e.g., camera framing, atmosphere) that amplify and stabilize the latent trigger's effect. We formalize attack generation as a constrained optimization over the above modular prompt space and solve it with a guided search procedure that balances stealth and effectiveness. Extensive experiments over 7 T2V models demonstrate the efficacy of our attack, achieving a 23 percent improvement in average attack success rate in commercial models.
Problem

Research questions and friction points this paper is trying to address.

Circumventing safety guardrails in text-to-video models using implicit cues
Generating policy-violating videos through benign-looking prompts with hidden triggers
Exploiting cross-modal associations to bypass content safety filters
Innovation

Methods, ideas, or system contributions that make the work stand out.

Modular prompt design with neutral scene anchors
Latent auditory triggers exploiting audio-visual associations
Stylistic modulators amplifying unsafe concept generation
🔎 Similar Papers
No similar papers found.
Zonghao Ying
Zonghao Ying
SKLCCSE, BUAA
Trustworthy AI
M
Moyang Chen
College of Science, Mathematics and Technology, Wenzhou-Kean University
N
Nizhang Li
Faculty of Innovation Engineering, Macau University of Science and Technology
Z
Zhiqiang Wang
Hong Kong University of Science and Technology
W
Wenxin Zhang
University of Chinese Academy of Sciences
Q
Quanchen Zou
360 AI Security Lab
Zonglei Jing
Zonglei Jing
Beihang University
Machine LearningReinforcement LearningOptimal Control
A
Aishan Liu
State Key Laboratory of Complex & Critical Software Environment, Beihang University
X
Xianglong Liu
Zhongguancun Laboratory