Towards a comprehensive taxonomy of online abusive language informed by machine leaning

📅 2025-04-24

📈 Citations: 0

✨ Influential: 0

career value

194K/year

🤖 AI Summary

The proliferation of online abusive language poses severe threats to individual and community well-being, necessitating a unified, actionable framework for detection and intervention. To address this, we propose the first hierarchical, multidimensional taxonomy of online abusive language, integrating annotation logics from 18 multilabel datasets. Our taxonomy systematically organizes 17 fine-grained dimensions across five core categories: context, target, intensity, directness, and theme. Methodologically, we combine systematic literature review, multilabel mapping, hierarchical clustering, and expert validation to ensure both theoretical rigor and practical scalability. The resulting open-source taxonomy has fostered initial consensus among researchers, platform operators, and policymakers on detection standards, cross-dataset alignment, and collaborative governance. It serves as a foundational tool for continuous monitoring, precise identification, and early intervention against online abuse.

Technology Category

Application Category

📝 Abstract

The proliferation of abusive language in online communications has posed significant risks to the health and wellbeing of individuals and communities. The growing concern regarding online abuse and its consequences necessitates methods for identifying and mitigating harmful content and facilitating continuous monitoring, moderation, and early intervention. This paper presents a taxonomy for distinguishing key characteristics of abusive language within online text. Our approach uses a systematic method for taxonomy development, integrating classification systems of 18 existing multi-label datasets to capture key characteristics relevant to online abusive language classification. The resulting taxonomy is hierarchical and faceted, comprising 5 categories and 17 dimensions. It classifies various facets of online abuse, including context, target, intensity, directness, and theme of abuse. This shared understanding can lead to more cohesive efforts, facilitate knowledge exchange, and accelerate progress in the field of online abuse detection and mitigation among researchers, policy makers, online platform owners, and other stakeholders.

Problem

Research questions and friction points this paper is trying to address.

Develops a taxonomy for classifying online abusive language characteristics

Integrates 18 multi-label datasets to identify key abuse facets

Aims to improve detection and mitigation of harmful online content

Innovation

Methods, ideas, or system contributions that make the work stand out.

Hierarchical faceted taxonomy for abuse classification

Integration of 18 multi-label datasets

Machine learning informed abusive language taxonomy

🔎 Similar Papers

Cross-lingual Offensive Language Detection: A Systematic Review of Datasets, Transfer Approaches and Challenges