X-MuTeST: A Multilingual Benchmark for Explainable Hate Speech Detection and A Novel LLM-consulted Explanation Framework

📅 2026-01-06
🏛️ arXiv.org
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the challenges of scarce labeled data and limited interpretability in hate speech detection for low-resource Indian languages such as Hindi and Telugu. The authors propose X-MuTeST, a novel framework that integrates semantic reasoning from large language models (LLMs) with an attention mechanism enhanced by n-gram probability differences. They also introduce the first multilingual hate speech benchmark annotated with human-provided, word-level rationales. By leveraging these human rationales during training and combining LLM-generated explanations with those produced by X-MuTeST, the model achieves significant improvements in both classification accuracy and explanation quality—measured by metrics including Token-F1 and IOU-F1—across 6,004 Hindi, 4,492 Telugu, and 6,334 English samples, thereby advancing interpretable content moderation for low-resource languages.

Technology Category

Application Category

📝 Abstract
Hate speech detection on social media faces challenges in both accuracy and explainability, especially for underexplored Indic languages. We propose a novel explainability-guided training framework, X-MuTeST (eXplainable Multilingual haTe Speech deTection), for hate speech detection that combines high-level semantic reasoning from large language models (LLMs) with traditional attention-enhancing techniques. We extend this research to Hindi and Telugu alongside English by providing benchmark human-annotated rationales for each word to justify the assigned class label. The X-MuTeST explainability method computes the difference between the prediction probabilities of the original text and those of unigrams, bigrams, and trigrams. Final explanations are computed as the union between LLM explanations and X-MuTeST explanations. We show that leveraging human rationales during training enhances both classification performance and explainability. Moreover, combining human rationales with our explainability method to refine the model attention yields further improvements. We evaluate explainability using Plausibility metrics such as Token-F1 and IOU-F1 and Faithfulness metrics such as Comprehensiveness and Sufficiency. By focusing on under-resourced languages, our work advances hate speech detection across diverse linguistic contexts. Our dataset includes token-level rationale annotations for 6,004 Hindi, 4,492 Telugu, and 6,334 English samples. Data and code are available on https://github.com/ziarehman30/X-MuTeST
Problem

Research questions and friction points this paper is trying to address.

hate speech detection
explainability
multilingual
under-resourced languages
social media
Innovation

Methods, ideas, or system contributions that make the work stand out.

Explainable AI
Multilingual Hate Speech Detection
LLM-guided Explanation
Human Rationales
Attention Refinement
🔎 Similar Papers
No similar papers found.
M
Mohammad Zia Ur Rehman
Indian Institute of Technology Indore, Madhya Pradesh India
Sai Kartheek Reddy Kasu
Sai Kartheek Reddy Kasu
IIIT Dharwad, India
Social ComputingNatural Language ProcessingResponsible AI
S
Shashivardhan Reddy Koppula
Indian Institute of Technology Indore, Madhya Pradesh India
S
Sai Rithwik Reddy Chirra
Arizona State University, United States
S
Shwetank Shekhar Singh
Indian Institute of Technology Mandi, India
N
Nagendra Kumar
Indian Institute of Technology Indore, Madhya Pradesh India