Falcon: A Cross-Modal Evaluation Dataset for Comprehensive Safety Perception

📅 2025-09-28
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing safety evaluation methods for multimodal large language models (MLLMs) in visual question answering (VQA) lack depth and overlook the critical regulatory role of visual inputs in harm perception. Method: We introduce Falcon, a large-scale, fine-grained safety benchmark comprising 57,515 image–text pairs spanning 13 harm categories. Falcon is the first to systematically uncover how images fundamentally influence harmfulness judgments, providing triple-attribute annotations (image, instruction, response) and evidence-backed explanatory labels. Leveraging Qwen2.5-VL-7B, we fine-tune FalconEye—a dedicated, interpretable safety evaluator. Results: FalconEye achieves significantly higher overall accuracy than state-of-the-art methods on Falcon-test, VLGuard, and Beavertail-V benchmarks, demonstrating superior detection precision and intrinsic interpretability. This work establishes a novel paradigm for safe, reliable MLLM deployment grounded in multimodal safety reasoning.

Technology Category

Application Category

📝 Abstract
Existing methods for evaluating the harmfulness of content generated by large language models (LLMs) have been well studied. However, approaches tailored to multimodal large language models (MLLMs) remain underdeveloped and lack depth. This work highlights the crucial role of visual information in moderating content in visual question answering (VQA), a dimension often overlooked in current research. To bridge this gap, we introduce Falcon, a large-scale vision-language safety dataset containing 57,515 VQA pairs across 13 harm categories. The dataset provides explicit annotations for harmful attributes across images, instructions, and responses, thereby facilitating a comprehensive evaluation of the content generated by MLLMs. In addition, it includes the relevant harm categories along with explanations supporting the corresponding judgments. We further propose FalconEye, a specialized evaluator fine-tuned from Qwen2.5-VL-7B using the Falcon dataset. Experimental results demonstrate that FalconEye reliably identifies harmful content in complex and safety-critical multimodal dialogue scenarios. It outperforms all other baselines in overall accuracy across our proposed Falcon-test dataset and two widely-used benchmarks-VLGuard and Beavertail-V, underscoring its potential as a practical safety auditing tool for MLLMs.
Problem

Research questions and friction points this paper is trying to address.

Evaluating multimodal LLM safety lacks comprehensive vision-language datasets
Visual information's moderating role in VQA safety remains underdeveloped
Need specialized evaluator for harmful content in multimodal dialogue scenarios
Innovation

Methods, ideas, or system contributions that make the work stand out.

Introduces Falcon dataset for multimodal safety evaluation
Proposes FalconEye evaluator fine-tuned from Qwen2.5-VL-7B
Specialized tool identifies harmful content in multimodal dialogues
🔎 Similar Papers
No similar papers found.
Q
Qi Xue
Laboratory of Intelligent Collaborative Computing, University of Electronic Science and Technology of China
M
Minrui Jiang
Laboratory of Intelligent Collaborative Computing, University of Electronic Science and Technology of China
R
Runjia Zhang
Laboratory of Intelligent Collaborative Computing, University of Electronic Science and Technology of China
X
Xiurui Xie
Laboratory of Intelligent Collaborative Computing, University of Electronic Science and Technology of China
Pei Ke
Pei Ke
Associate Professor, University of Electronic Science and Technology of China
Natural Language ProcessingNatural Language GenerationDialogue SystemLarge Language Model
Guisong Liu
Guisong Liu
university of electronic science and technology of china
neural networks;machine learning;artificial intelligence