Specializing General-purpose LLM Embeddings for Implicit Hate Speech Detection across Datasets

πŸ“… 2025-08-28
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
Implicit hate speech (IHS) detection is challenging due to the absence of overt slurs and reliance on irony, implication, or coded language. This paper proposes a lightweight, efficient approach: fine-tuning only the embedding layers of general-purpose large language models (LLMs)β€”such as Stella, Jasper, NV-Embed, and E5β€”at a fine-grained level, without incorporating external knowledge or auxiliary modules. The method significantly enhances semantic representation capability for IHS. Its core contribution lies in empirically validating the strong cross-dataset generalizability of pure embedding fine-tuningβ€”a paradigm previously underexplored for IHS. Experiments demonstrate up to a 1.10 percentage point improvement in macro-F1 on in-domain evaluation and up to a 20.35 percentage point gain in cross-dataset settings. This offers a scalable, easily deployable solution for IHS detection under low-resource conditions and without supervised pretraining assumptions.

Technology Category

Application Category

πŸ“ Abstract
Implicit hate speech (IHS) is indirect language that conveys prejudice or hatred through subtle cues, sarcasm or coded terminology. IHS is challenging to detect as it does not include explicit derogatory or inflammatory words. To address this challenge, task-specific pipelines can be complemented with external knowledge or additional information such as context, emotions and sentiment data. In this paper, we show that, by solely fine-tuning recent general-purpose embedding models based on large language models (LLMs), such as Stella, Jasper, NV-Embed and E5, we achieve state-of-the-art performance. Experiments on multiple IHS datasets show up to 1.10 percentage points improvements for in-dataset, and up to 20.35 percentage points improvements in cross-dataset evaluation, in terms of F1-macro score.
Problem

Research questions and friction points this paper is trying to address.

Detecting implicit hate speech using subtle cues and sarcasm
Improving cross-dataset generalization for hate speech detection
Fine-tuning general-purpose LLM embeddings for specialized tasks
Innovation

Methods, ideas, or system contributions that make the work stand out.

Fine-tuning general-purpose LLM embeddings
Achieving state-of-the-art detection performance
Improving cross-dataset evaluation significantly
πŸ”Ž Similar Papers
No similar papers found.
V
Vassiliy Cheremetiev
EPFL, Lausanne, Switzerland; Idiap Research Institute, Martigny, Switzerland
Q
Quang Long Ho Ngo
EPFL, Lausanne, Switzerland; Idiap Research Institute, Martigny, Switzerland
C
Chau Ying Kot
EPFL, Lausanne, Switzerland
Alina Elena Baia
Alina Elena Baia
Idiap Research Institute
Privacy in CVXAIAdversarial attacks
Andrea Cavallaro
Andrea Cavallaro
Director, Idiap Research Institute; Professor, EPFL
Machine LearningComputer VisionAudio ProcessingRobot PerceptionPrivacy