🤖 AI Summary
To address the low accuracy and poor robustness of harmful text detection in low-resource and multilingual settings, this paper proposes a retrieval-augmented detection framework integrating pretrained language models with knowledge graphs. Our method innovatively introduces an external knowledge co-modeling mechanism, leveraging the RAG paradigm to jointly incorporate BERT-style encoders, the ConceptNet knowledge graph, and a dynamic context alignment module for deep identification of subtle harmful semantics. Experimental results demonstrate an average 12.7% improvement in F1-score across six benchmark datasets; under low-resource conditions, the relative gain reaches 23.4%; and in multilingual evaluation, accuracy surpasses state-of-the-art models by 4.1–8.9 percentage points. To our knowledge, this is the first work to integrate structured commonsense knowledge with dynamic retrieval-based contextual alignment into harmful text detection, effectively mitigating the semantic modeling limitations inherent in single-model approaches.
📝 Abstract
Harmful text detection has become a crucial task in the development and deployment of large language models, especially as AI-generated content continues to expand across digital platforms. This study proposes a joint retrieval framework that integrates pre-trained language models with knowledge graphs to improve the accuracy and robustness of harmful text detection. Experimental results demonstrate that the joint retrieval approach significantly outperforms single-model baselines, particularly in low-resource training scenarios and multilingual environments. The proposed method effectively captures nuanced harmful content by leveraging external contextual information, addressing the limitations of traditional detection models. Future research should focus on optimizing computational efficiency, enhancing model interpretability, and expanding multimodal detection capabilities to better tackle evolving harmful content patterns. This work contributes to the advancement of AI safety, ensuring more trustworthy and reliable content moderation systems.