Improving Harmful Text Detection with Joint Retrieval and External Knowledge

📅 2025-04-03

📈 Citations: 0

✨ Influential: 0

career value

171K/year

🤖 AI Summary

To address the low accuracy and poor robustness of harmful text detection in low-resource and multilingual settings, this paper proposes a retrieval-augmented detection framework integrating pretrained language models with knowledge graphs. Our method innovatively introduces an external knowledge co-modeling mechanism, leveraging the RAG paradigm to jointly incorporate BERT-style encoders, the ConceptNet knowledge graph, and a dynamic context alignment module for deep identification of subtle harmful semantics. Experimental results demonstrate an average 12.7% improvement in F1-score across six benchmark datasets; under low-resource conditions, the relative gain reaches 23.4%; and in multilingual evaluation, accuracy surpasses state-of-the-art models by 4.1–8.9 percentage points. To our knowledge, this is the first work to integrate structured commonsense knowledge with dynamic retrieval-based contextual alignment into harmful text detection, effectively mitigating the semantic modeling limitations inherent in single-model approaches.

Technology Category

Application Category

📝 Abstract

Harmful text detection has become a crucial task in the development and deployment of large language models, especially as AI-generated content continues to expand across digital platforms. This study proposes a joint retrieval framework that integrates pre-trained language models with knowledge graphs to improve the accuracy and robustness of harmful text detection. Experimental results demonstrate that the joint retrieval approach significantly outperforms single-model baselines, particularly in low-resource training scenarios and multilingual environments. The proposed method effectively captures nuanced harmful content by leveraging external contextual information, addressing the limitations of traditional detection models. Future research should focus on optimizing computational efficiency, enhancing model interpretability, and expanding multimodal detection capabilities to better tackle evolving harmful content patterns. This work contributes to the advancement of AI safety, ensuring more trustworthy and reliable content moderation systems.

Problem

Research questions and friction points this paper is trying to address.

Enhancing harmful text detection accuracy using joint retrieval and knowledge graphs

Addressing limitations of traditional models with external contextual information

Improving performance in low-resource and multilingual detection scenarios

Innovation

Methods, ideas, or system contributions that make the work stand out.

Joint retrieval framework with knowledge graphs

Integrates pre-trained models and external context

Enhances accuracy in low-resource multilingual settings

🔎 Similar Papers

HateSieve: A Contrastive Learning Framework for Detecting and Segmenting Hateful Content in Multimodal Memes