Improving Harmful Text Detection with Joint Retrieval and External Knowledge

📅 2025-04-03
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the low accuracy and poor robustness of harmful text detection in low-resource and multilingual settings, this paper proposes a retrieval-augmented detection framework integrating pretrained language models with knowledge graphs. Our method innovatively introduces an external knowledge co-modeling mechanism, leveraging the RAG paradigm to jointly incorporate BERT-style encoders, the ConceptNet knowledge graph, and a dynamic context alignment module for deep identification of subtle harmful semantics. Experimental results demonstrate an average 12.7% improvement in F1-score across six benchmark datasets; under low-resource conditions, the relative gain reaches 23.4%; and in multilingual evaluation, accuracy surpasses state-of-the-art models by 4.1–8.9 percentage points. To our knowledge, this is the first work to integrate structured commonsense knowledge with dynamic retrieval-based contextual alignment into harmful text detection, effectively mitigating the semantic modeling limitations inherent in single-model approaches.

Technology Category

Application Category

📝 Abstract
Harmful text detection has become a crucial task in the development and deployment of large language models, especially as AI-generated content continues to expand across digital platforms. This study proposes a joint retrieval framework that integrates pre-trained language models with knowledge graphs to improve the accuracy and robustness of harmful text detection. Experimental results demonstrate that the joint retrieval approach significantly outperforms single-model baselines, particularly in low-resource training scenarios and multilingual environments. The proposed method effectively captures nuanced harmful content by leveraging external contextual information, addressing the limitations of traditional detection models. Future research should focus on optimizing computational efficiency, enhancing model interpretability, and expanding multimodal detection capabilities to better tackle evolving harmful content patterns. This work contributes to the advancement of AI safety, ensuring more trustworthy and reliable content moderation systems.
Problem

Research questions and friction points this paper is trying to address.

Enhancing harmful text detection accuracy using joint retrieval and knowledge graphs
Addressing limitations of traditional models with external contextual information
Improving performance in low-resource and multilingual detection scenarios
Innovation

Methods, ideas, or system contributions that make the work stand out.

Joint retrieval framework with knowledge graphs
Integrates pre-trained models and external context
Enhances accuracy in low-resource multilingual settings
Z
Zidong Yu
Syracuse University
S
Shuo Wang
Purdue University, Indianpolis, Indianapolis, USA
N
Nan Jiang
Carnegie Mellon University, Pittsburgh, USA
Weiqiang Huang
Weiqiang Huang
Northeastern University
AI/MLReinforcement LearningEdge ComputingNatural Language Processing
X
Xu Han
Brown University, Providence, USA
Junliang Du
Junliang Du
Shanghai Jiao Tong University
Bayesian Methodology