Breaking the Cloak! Unveiling Chinese Cloaked Toxicity with Homophone Graph and Toxic Lexicon

📅 2025-05-28
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the challenge of evasive toxic content—such as insults and discriminatory language—disguised via homophonic character substitution on Chinese social media platforms, this paper proposes C²TU, a training-free and prompt-free method. C²TU constructs a Chinese homophone graph and performs substring matching and candidate filtering using a curated toxicity lexicon, followed by BERT-based semantic filtering and large language model (LLM)-driven full-sequence contextual modeling to restore disguised tokens and regularize corrections. It is the first work to systematically tackle homophonic toxic content detection in Chinese; introduces the novel “training- and prompt-free” paradigm; and overcomes the autoregressive limitation of LLMs by enabling end-to-end non-autoregressive toxicity identification. Evaluated on two Chinese toxic content datasets, C²TU achieves 71% higher F1-score and 35% higher accuracy than the state-of-the-art methods.

Technology Category

Application Category

📝 Abstract
Social media platforms have experienced a significant rise in toxic content, including abusive language and discriminatory remarks, presenting growing challenges for content moderation. Some users evade censorship by deliberately disguising toxic words through homophonic cloak, which necessitates the task of unveiling cloaked toxicity. Existing methods are mostly designed for English texts, while Chinese cloaked toxicity unveiling has not been solved yet. To tackle the issue, we propose C$^2$TU, a novel training-free and prompt-free method for Chinese cloaked toxic content unveiling. It first employs substring matching to identify candidate toxic words based on Chinese homo-graph and toxic lexicon. Then it filters those candidates that are non-toxic and corrects cloaks to be their corresponding toxicities. Specifically, we develop two model variants for filtering, which are based on BERT and LLMs, respectively. For LLMs, we address the auto-regressive limitation in computing word occurrence probability and utilize the full semantic contexts of a text sequence to reveal cloaked toxic words. Extensive experiments demonstrate that C$^2$TU can achieve superior performance on two Chinese toxic datasets. In particular, our method outperforms the best competitor by up to 71% on the F1 score and 35% on accuracy, respectively.
Problem

Research questions and friction points this paper is trying to address.

Detecting Chinese cloaked toxic content on social media
Overcoming homophone-based evasion of content moderation
Developing training-free method for Chinese toxicity unveiling
Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses homograph and toxic lexicon
Employs BERT and LLMs variants
Corrects cloaks to toxic words
🔎 Similar Papers
No similar papers found.
X
Xuchen Ma
School of Data Science and Engineering, East China Normal University
Jianxiang Yu
Jianxiang Yu
East China Normal University
Data miningLarge language models
W
Wenming Shao
Shanghai EastWonder Info-tech Co., Ltd.
B
Bo Pang
Shanghai EastWonder Info-tech Co., Ltd.
X
Xiang Li
School of Data Science and Engineering, East China Normal University