SafeGen-Bench: Benchmarking Safety in Image-Conditioned Text-to-Video Generation

📅 2026-05-31

📈 Citations: 0

✨ Influential: 0

career value

181K/year

🤖 AI Summary

This work addresses a critical gap in current video generation safety evaluations, which often overlook illicit, politically sensitive, or ethically problematic content that can emerge from seemingly benign image-text inputs. To tackle this issue, the authors introduce SafeGen-Bench, the first benchmark to systematically assess the safety of image-conditioned text-to-video models with respect to temporal dynamics and behavioral semantics. The benchmark encompasses ten malicious scenarios, constructed from diverse real-world image-text pairs, and incorporates a dual-modality guardrail mechanism for comprehensive evaluation. Experimental results reveal that state-of-the-art models exhibit a 44.5% unsafe generation rate under high-quality conditions, while unimodal guardrails fail in up to 80% of cases across seven risk categories, underscoring the necessity and urgency of multimodal safety assessment.

📝 Abstract

With the rapid advancements in text-to-image diffusion models, generative video models (T2V models) like Sora can now produce short synthetic videos from a text prompt or an initial image. However, synthetic video generation -- especially when guided by an initial image -- often poses risks, including the potential creation of illegal, politically sensitive, or unethical content. Existing benchmarks have started to consider the safety of generated videos, but they primarily focus on testing models with malicious text prompts, ignoring the scenario where text prompt and image combination may still lead to harmful video content. In practice, this is a common and challenging issue: videos generated from safe text and image inputs can nonetheless convey harmful information. To bridge this gap, we introduce SafeGen-Bench, a benchmark specifically designed to evaluate the safety of conditional T2V models. Our benchmark defines 10 malicious categories, concentrating on risks related to both temporal sequences and depicted behaviors. SafeGen-Bench consists of carefully selected start frames from diverse image and video sources, paired with corresponding text prompts to simulate realistic inputs. We evaluate a variety of conditional T2V models on SafeGen-Bench, and the results indicate that current models struggle to consistently avoid generating malicious content with unsafety scores reaching up to 44.5, especially under conditions requiring high quality. Furthermore, we assess the effectiveness of both text-based and image-based guardrails on our benchmark, finding that unimodal guardrails alone were insufficient to provide a robust defense, with an 80\% failure rate across seven malicious categories. We hope that SafeGen-Bench will foster the development of safer and more controllable conditional T2V models.

Problem

Research questions and friction points this paper is trying to address.

text-to-video generation

safety benchmark

image-conditioned generation

malicious content

conditional generative models

Innovation

Methods, ideas, or system contributions that make the work stand out.

conditional text-to-video generation

safety benchmark

multimodal guardrails