HateDebias: On the Diversity and Variability of Hate Speech Debiasing

📅 2024-06-07
🏛️ arXiv.org
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing hate speech detection datasets overlook the diversity and temporal evolution of bias types and distributions, failing to reflect real-world dynamic bias environments. To address this, we introduce HateDebias—the first debiasing benchmark explicitly designed for dynamic bias scenarios—featuring multi-source, multi-type, and temporally evolving bias distributions, enabling robust evaluation under continual learning. Methodologically, we present the first systematic modeling of bias diversity and variability, proposing a continual debiasing framework that jointly integrates bias-informed regularization and memory replay. Extensive experiments on HateDebias demonstrate that our approach significantly improves accuracy across mainstream models and effectively mitigates performance degradation across shifting bias domains. These results validate its effectiveness and generalizability in realistic social media environments.

Technology Category

Application Category

📝 Abstract
Hate speech on social media is ubiquitous but urgently controlled. Without detecting and mitigating the biases brought by hate speech, different types of ethical problems. While a number of datasets have been proposed to address the problem of hate speech detection, these datasets seldom consider the diversity and variability of bias, making it far from real-world scenarios. To fill this gap, we propose a benchmark, named HateDebias, to analyze the model ability of hate speech detection under continuous, changing environments. Specifically, to meet the diversity of biases, we collect existing hate speech detection datasets with different types of biases. To further meet the variability (i.e., the changing of bias attributes in datasets), we reorganize datasets to follow the continuous learning setting. We evaluate the detection accuracy of models trained on the datasets with a single type of bias with the performance on the HateDebias, where a significant performance drop is observed. To provide a potential direction for debiasing, we further propose a debiasing framework based on continuous learning and bias information regularization, as well as the memory replay strategies to ensure the debiasing ability of the model. Experiment results on the proposed benchmark show that the aforementioned method can improve several baselines with a distinguished margin, highlighting its effectiveness in real-world applications.
Problem

Research questions and friction points this paper is trying to address.

Addressing bias diversity and variability in hate speech detection
Evaluating model fairness in dynamically evolving social media environments
Mitigating performance degradation from specific bias attributes in datasets
Innovation

Methods, ideas, or system contributions that make the work stand out.

Constructed HateDebias benchmark for dynamic bias analysis
Integrated memory replay with bias regularization techniques
Collected diverse real-world hate speech datasets
🔎 Similar Papers
No similar papers found.
N
Nankai Lin
School of Computer Science, Guangdong University of Technology
H
Hongyan Wu
College of Computer, National University of Defense Technology
Zhengming Chen
Zhengming Chen
Shantou University
causal discoverymachine learning
Z
Zijian Li
Mohamed bin Zayed University of Artificial Intelligence
L
Lianxi Wang
School of Information Science and Technology, Guangdong University of Foreign Studies
Shengyi Jiang
Shengyi Jiang
The University of Hong Kong
D
Dong Zhou
School of Information Science and Technology, Guangdong University of Foreign Studies
A
Aimin Yang
School of Computer Science, Guangdong University of Technology