SafeSpeech: A Comprehensive and Interactive Tool for Analysing Sexist and Abusive Language in Conversations

📅 2025-03-09
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the challenge of detecting context-dependent toxic language—such as implicit gender bias, harassment, and abuse—in conversational settings. We propose the first multi-granularity toxicity analysis framework specifically designed for dialogue. Methodologically, it integrates message-level classification with dialogue-level modeling, innovatively incorporating toxicity-aware dialogue summarization and persona profiling. Furthermore, we design a perplexity-gain-based mechanism to enhance interpretability. Extensive experiments on established benchmarks—including EDOS, OffensEval, and HatEval—demonstrate state-of-the-art performance: our approach significantly improves fine-grained gender-bias detection accuracy while enabling cross-message toxicity tracking, context-aware summarization, and behavioral persona characterization. The framework thus achieves both strong discriminative capability and principled interpretability.

Technology Category

Application Category

📝 Abstract
Detecting toxic language including sexism, harassment and abusive behaviour, remains a critical challenge, particularly in its subtle and context-dependent forms. Existing approaches largely focus on isolated message-level classification, overlooking toxicity that emerges across conversational contexts. To promote and enable future research in this direction, we introduce SafeSpeech, a comprehensive platform for toxic content detection and analysis that bridges message-level and conversation-level insights. The platform integrates fine-tuned classifiers and large language models (LLMs) to enable multi-granularity detection, toxic-aware conversation summarization, and persona profiling. SafeSpeech also incorporates explainability mechanisms, such as perplexity gain analysis, to highlight the linguistic elements driving predictions. Evaluations on benchmark datasets, including EDOS, OffensEval, and HatEval, demonstrate the reproduction of state-of-the-art performance across multiple tasks, including fine-grained sexism detection.
Problem

Research questions and friction points this paper is trying to address.

Detects subtle, context-dependent toxic language in conversations.
Bridges message-level and conversation-level toxic content analysis.
Integrates classifiers and LLMs for multi-granularity detection and explainability.
Innovation

Methods, ideas, or system contributions that make the work stand out.

Integrates fine-tuned classifiers and LLMs
Enables multi-granularity toxic content detection
Incorporates explainability mechanisms like perplexity gain
🔎 Similar Papers
No similar papers found.
Xingwei Tan
Xingwei Tan
Research Associate
Natural Language Processing
Chen Lyu
Chen Lyu
Wuhan University
natural language processing
H
Hafiz Muhammad Umer
Department of Computer Science, University of Warwick, UK
S
Sahrish Khan
Department of Computer Science, University of Warwick, UK
M
Mahathi Parvatham
Department of Computer Science, University of Warwick, UK
L
Lois Arthurs
Forensic Capability Network, UK
S
Simon Cullen
Forensic Capability Network, UK
S
Shelley Wilson
Forensic Capability Network, UK
A
Arshad Jhumka
School of Computer Science, University of Leeds, UK
Gabriele Pergola
Gabriele Pergola
Assistant Professor, University of Warwick
Natural Language ProcessingSentiment AnalysisQuestion AnsweringMachine Learning