M$^3$oralBench: A MultiModal Moral Benchmark for LVLMs

📅 2024-12-30
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Large Vision-Language Models (LVLMs) lack systematic moral evaluation frameworks for high-stakes domains such as law, finance, and healthcare. Method: This paper introduces MoralBench—the first multimodal moral evaluation benchmark tailored for LVLMs—grounded in the Moral Foundations Theory’s six dimensions. It integrates extended moral scenario texts with paired images generated by SD3.0, supporting three task types: moral judgment, classification, and response generation. MoralBench pioneers cross-modal (text–image) moral assessment, extends Moral Foundations Variables (MFVs) to multimodal settings, and defines an LVLM-specific evaluation paradigm. Contribution/Results: Evaluated on ten state-of-the-art LVLMs, MoralBench reveals significant reasoning deficiencies across fairness, authority, and purity dimensions. The benchmark’s code and dataset are publicly released to foster reproducible, rigorous moral evaluation of multimodal AI systems.

Technology Category

Application Category

📝 Abstract
Recently, large foundation models, including large language models (LLMs) and large vision-language models (LVLMs), have become essential tools in critical fields such as law, finance, and healthcare. As these models increasingly integrate into our daily life, it is necessary to conduct moral evaluation to ensure that their outputs align with human values and remain within moral boundaries. Previous works primarily focus on LLMs, proposing moral datasets and benchmarks limited to text modality. However, given the rapid development of LVLMs, there is still a lack of multimodal moral evaluation methods. To bridge this gap, we introduce M$^3$oralBench, the first MultiModal Moral Benchmark for LVLMs. M$^3$oralBench expands the everyday moral scenarios in Moral Foundations Vignettes (MFVs) and employs the text-to-image diffusion model, SD3.0, to create corresponding scenario images. It conducts moral evaluation across six moral foundations of Moral Foundations Theory (MFT) and encompasses tasks in moral judgement, moral classification, and moral response, providing a comprehensive assessment of model performance in multimodal moral understanding and reasoning. Extensive experiments on 10 popular open-source and closed-source LVLMs demonstrate that M$^3$oralBench is a challenging benchmark, exposing notable moral limitations in current models. Our benchmark is publicly available.
Problem

Research questions and friction points this paper is trying to address.

Ethical Judgement
Multi-modal Models
Critical Applications
Innovation

Methods, ideas, or system contributions that make the work stand out.

Moral Judgment
Multimodal Benchmark
Ethical Reasoning
🔎 Similar Papers
No similar papers found.
Bei Yan
Bei Yan
Northeastern University
Signal Processing
J
Jie Zhang
Key Laboratory of AI Safety of CAS, Institute of Computing Technology, Chinese Academy of Sciences (CAS), Beijing 100190, China; University of Chinese Academy of Sciences, Beijing 100049, China
Z
Zhiyuan Chen
Key Laboratory of AI Safety of CAS, Institute of Computing Technology, Chinese Academy of Sciences (CAS), Beijing 100190, China; University of Chinese Academy of Sciences, Beijing 100049, China
Shiguang Shan
Shiguang Shan
Professor of Institute of Computing Technology, Chinese Academy of Sciences
Computer VisionPattern RecognitionMachine LearningFace Recognition
X
Xilin Chen
Key Laboratory of AI Safety of CAS, Institute of Computing Technology, Chinese Academy of Sciences (CAS), Beijing 100190, China; University of Chinese Academy of Sciences, Beijing 100049, China