π€ AI Summary
Implicit hate speech (IHS) detection is challenging due to the absence of overt slurs and reliance on irony, implication, or coded language. This paper proposes a lightweight, efficient approach: fine-tuning only the embedding layers of general-purpose large language models (LLMs)βsuch as Stella, Jasper, NV-Embed, and E5βat a fine-grained level, without incorporating external knowledge or auxiliary modules. The method significantly enhances semantic representation capability for IHS. Its core contribution lies in empirically validating the strong cross-dataset generalizability of pure embedding fine-tuningβa paradigm previously underexplored for IHS. Experiments demonstrate up to a 1.10 percentage point improvement in macro-F1 on in-domain evaluation and up to a 20.35 percentage point gain in cross-dataset settings. This offers a scalable, easily deployable solution for IHS detection under low-resource conditions and without supervised pretraining assumptions.
π Abstract
Implicit hate speech (IHS) is indirect language that conveys prejudice or hatred through subtle cues, sarcasm or coded terminology. IHS is challenging to detect as it does not include explicit derogatory or inflammatory words. To address this challenge, task-specific pipelines can be complemented with external knowledge or additional information such as context, emotions and sentiment data. In this paper, we show that, by solely fine-tuning recent general-purpose embedding models based on large language models (LLMs), such as Stella, Jasper, NV-Embed and E5, we achieve state-of-the-art performance. Experiments on multiple IHS datasets show up to 1.10 percentage points improvements for in-dataset, and up to 20.35 percentage points improvements in cross-dataset evaluation, in terms of F1-macro score.