GOAT-Bench: Safety Insights to Large Multimodal Models through Meme-Based Social Abuse

📅 2024-01-03
🏛️ arXiv.org
📈 Citations: 20
Influential: 3
📄 PDF
🤖 AI Summary
This work addresses the critical safety gap in large multimodal models (LMMs) regarding the detection of implicit social abuse—such as veiled hate speech, gendered microaggressions, and cyberbullying—in internet memes. We introduce GOAT-Bench, the first benchmark explicitly designed for evaluating multimodal implicit abuse, comprising over 6,000 thematically diverse memes. Our method proposes a fine-grained cross-modal safety evaluation framework integrating expert human annotation, comparative analysis across multiple LMMs (e.g., GPT-4o), and a rigorous semantic alignment protocol to ensure consistent harm assessment. Experimental results reveal that state-of-the-art LMMs exhibit significantly low accuracy in identifying implicit offensiveness, sarcasm, and overall harmfulness—highlighting their limited sensitivity to non-explicit abusive content. GOAT-Bench is publicly released to advance standardized, reproducible research in multimodal content safety evaluation.

Technology Category

Application Category

📝 Abstract
The exponential growth of social media has profoundly transformed how information is created, disseminated, and absorbed, exceeding any precedent in the digital age. Regrettably, this explosion has also spawned a significant increase in the online abuse of memes. Evaluating the negative impact of memes is notably challenging, owing to their often subtle and implicit meanings, which are not directly conveyed through the overt text and image. In light of this, large multimodal models (LMMs) have emerged as a focal point of interest due to their remarkable capabilities in handling diverse multimodal tasks. In response to this development, our paper aims to thoroughly examine the capacity of various LMMs (e.g., GPT-4o) to discern and respond to the nuanced aspects of social abuse manifested in memes. We introduce the comprehensive meme benchmark, GOAT-Bench, comprising over 6K varied memes encapsulating themes such as implicit hate speech, sexism, and cyberbullying, etc. Utilizing GOAT-Bench, we delve into the ability of LMMs to accurately assess hatefulness, misogyny, offensiveness, sarcasm, and harmful content. Our extensive experiments across a range of LMMs reveal that current models still exhibit a deficiency in safety awareness, showing insensitivity to various forms of implicit abuse. We posit that this shortfall represents a critical impediment to the realization of safe artificial intelligence. The GOAT-Bench and accompanying resources are publicly accessible at https://goatlmm.github.io/, contributing to ongoing research in this vital field.
Problem

Research questions and friction points this paper is trying to address.

Evaluating LMMs' ability to detect social abuse in memes.
Assessing LMMs' sensitivity to implicit hate speech and harmful content.
Identifying deficiencies in LMMs' safety awareness for AI safety.
Innovation

Methods, ideas, or system contributions that make the work stand out.

Developed GOAT-Bench for meme-based social abuse analysis
Evaluated LMMs on implicit hate speech and sexism
Identified LMMs' deficiencies in detecting implicit abuse
🔎 Similar Papers
No similar papers found.
Hongzhan Lin
Hongzhan Lin
Hong Kong Baptist University
Natural Language ProcessingMultimodal ReasoningSocial Computing
Ziyang Luo
Ziyang Luo
Salesforce AI Research
AgentsLLMsMultimodal
B
Bo Wang
Department of Computer Science, Hong Kong Baptist University
R
Ruichao Yang
Department of Computer Science, Hong Kong Baptist University
J
Jing Ma
Department of Computer Science, Hong Kong Baptist University