Beyond Hate Speech: NLP's Challenges and Opportunities in Uncovering Dehumanizing Language

📅 2024-02-21
🏛️ arXiv.org
📈 Citations: 1
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the automatic detection of dehumanizing language—a subtle yet severely harmful form of hate speech—by clarifying its conceptual boundary with general hate speech and tackling challenges including data scarcity, linguistic obfuscation, and annotation paucity. We conduct the first systematic evaluation of large language models (LLMs)—GPT-4, GPT-3.5, and LLaMA-2—in zero-shot and few-shot settings, assessing both detection performance and demographic group bias. To this end, we construct a manually annotated benchmark and perform automated labeling experiments. Results reveal that state-of-the-art LLMs achieve at most 70% accuracy and exhibit substantial disparity in sensitivity across target groups. Moreover, synthetically generated data proves insufficient in quality to support robust downstream model training. Our core contribution lies in empirically uncovering structural biases inherent in LLMs for dehumanization detection and rigorously demonstrating their inadequacy as high-fidelity annotation substitutes.

Technology Category

Application Category

📝 Abstract
Dehumanization, characterized as a subtle yet harmful manifestation of hate speech, involves denying individuals of their human qualities and often results in violence against marginalized groups. Despite significant progress in Natural Language Processing across various domains, its application in detecting dehumanizing language is limited, largely due to the scarcity of publicly available annotated data for this domain. This paper evaluates the performance of cutting-edge NLP models, including GPT-4, GPT-3.5, and LLAMA-2, in identifying dehumanizing language. Our findings reveal that while these models demonstrate potential, achieving a 70% accuracy rate in distinguishing dehumanizing language from broader hate speech, they also display biases. They are over-sensitive in classifying other forms of hate speech as dehumanization for a specific subset of target groups, while more frequently failing to identify clear cases of dehumanization for other target groups. Moreover, leveraging one of the best-performing models, we automatically annotated a larger dataset for training more accessible models. However, our findings indicate that these models currently do not meet the high-quality data generation threshold necessary for this task.
Problem

Research questions and friction points this paper is trying to address.

Detecting dehumanizing language in NLP due to subtle expressions
Evaluating LLMs for dehumanization detection with limited performance
Addressing disparities in model accuracy across target groups
Innovation

Methods, ideas, or system contributions that make the work stand out.

Evaluates four LLMs for dehumanization detection
Optimizes Claude model for 80% F1 performance
Identifies group-level disparities in model predictions
🔎 Similar Papers
No similar papers found.