An Annotated Corpus of Arabic Tweets for Hate Speech Analysis

📅 2025-05-17

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

Arabic hate speech detection is highly challenging due to extensive dialectal diversity. To address this, we introduce the first multi-label Arabic hate speech dataset specifically curated from Twitter (10,000 tweets), annotated with fine-grained labels for offensiveness and six target categories: religion, gender, politics, race, origin, and other. The dataset supports both single- and multi-target classification. We propose a novel Arabic multi-target hate speech annotation framework, featuring a high-agreement crowdsourcing protocol (Krippendorff’s α = 0.86 for offensiveness, 0.71 for targets) and a systematic cross-dialect coverage strategy. Using AraBERTv2 and other Transformer-based models, we conduct fine-tuning and multi-label classification experiments, achieving a micro-F1 score of 0.7865 and accuracy of 0.786. These results empirically validate the dataset’s quality, the robustness of our annotation framework, and the suitability of modern Arabic language models for multi-target hate speech detection.

Technology Category

Application Category

📝 Abstract

Identifying hate speech content in the Arabic language is challenging due to the rich quality of dialectal variations. This study introduces a multilabel hate speech dataset in the Arabic language. We have collected 10000 Arabic tweets and annotated each tweet, whether it contains offensive content or not. If a text contains offensive content, we further classify it into different hate speech targets such as religion, gender, politics, ethnicity, origin, and others. A text can contain either single or multiple targets. Multiple annotators are involved in the data annotation task. We calculated the inter-annotator agreement, which was reported to be 0.86 for offensive content and 0.71 for multiple hate speech targets. Finally, we evaluated the data annotation task by employing a different transformers-based model in which AraBERTv2 outperformed with a micro-F1 score of 0.7865 and an accuracy of 0.786.

Problem

Research questions and friction points this paper is trying to address.

Identifying hate speech in Arabic tweets with dialectal variations

Creating a multilabel Arabic hate speech dataset with annotations

Evaluating annotation quality using transformer models like AraBERTv2

Innovation

Methods, ideas, or system contributions that make the work stand out.

Multilabel Arabic hate speech dataset creation

Multiple annotators ensure high inter-annotator agreement

AraBERTv2 model achieves best performance evaluation

🔎 Similar Papers

No similar papers found.

Authors to Follow