SafeSpeech: A Comprehensive and Interactive Tool for Analysing Sexist and Abusive Language in Conversations

📅 2025-03-09

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

This work addresses the challenge of detecting context-dependent toxic language—such as implicit gender bias, harassment, and abuse—in conversational settings. We propose the first multi-granularity toxicity analysis framework specifically designed for dialogue. Methodologically, it integrates message-level classification with dialogue-level modeling, innovatively incorporating toxicity-aware dialogue summarization and persona profiling. Furthermore, we design a perplexity-gain-based mechanism to enhance interpretability. Extensive experiments on established benchmarks—including EDOS, OffensEval, and HatEval—demonstrate state-of-the-art performance: our approach significantly improves fine-grained gender-bias detection accuracy while enabling cross-message toxicity tracking, context-aware summarization, and behavioral persona characterization. The framework thus achieves both strong discriminative capability and principled interpretability.

Technology Category

Application Category

📝 Abstract

Detecting toxic language including sexism, harassment and abusive behaviour, remains a critical challenge, particularly in its subtle and context-dependent forms. Existing approaches largely focus on isolated message-level classification, overlooking toxicity that emerges across conversational contexts. To promote and enable future research in this direction, we introduce SafeSpeech, a comprehensive platform for toxic content detection and analysis that bridges message-level and conversation-level insights. The platform integrates fine-tuned classifiers and large language models (LLMs) to enable multi-granularity detection, toxic-aware conversation summarization, and persona profiling. SafeSpeech also incorporates explainability mechanisms, such as perplexity gain analysis, to highlight the linguistic elements driving predictions. Evaluations on benchmark datasets, including EDOS, OffensEval, and HatEval, demonstrate the reproduction of state-of-the-art performance across multiple tasks, including fine-grained sexism detection.

Problem

Research questions and friction points this paper is trying to address.

Detects subtle, context-dependent toxic language in conversations.

Bridges message-level and conversation-level toxic content analysis.

Integrates classifiers and LLMs for multi-granularity detection and explainability.

Innovation

Methods, ideas, or system contributions that make the work stand out.

Integrates fine-tuned classifiers and LLMs

Enables multi-granularity toxic content detection

Incorporates explainability mechanisms like perplexity gain

🔎 Similar Papers

No similar papers found.

Authors to Follow