Breaking the Cloak! Unveiling Chinese Cloaked Toxicity with Homophone Graph and Toxic Lexicon

📅 2025-05-28

📈 Citations: 0

✨ Influential: 0

career value

199K/year

🤖 AI Summary

To address the challenge of evasive toxic content—such as insults and discriminatory language—disguised via homophonic character substitution on Chinese social media platforms, this paper proposes C²TU, a training-free and prompt-free method. C²TU constructs a Chinese homophone graph and performs substring matching and candidate filtering using a curated toxicity lexicon, followed by BERT-based semantic filtering and large language model (LLM)-driven full-sequence contextual modeling to restore disguised tokens and regularize corrections. It is the first work to systematically tackle homophonic toxic content detection in Chinese; introduces the novel “training- and prompt-free” paradigm; and overcomes the autoregressive limitation of LLMs by enabling end-to-end non-autoregressive toxicity identification. Evaluated on two Chinese toxic content datasets, C²TU achieves 71% higher F1-score and 35% higher accuracy than the state-of-the-art methods.

Technology Category

Application Category

📝 Abstract

Social media platforms have experienced a significant rise in toxic content, including abusive language and discriminatory remarks, presenting growing challenges for content moderation. Some users evade censorship by deliberately disguising toxic words through homophonic cloak, which necessitates the task of unveiling cloaked toxicity. Existing methods are mostly designed for English texts, while Chinese cloaked toxicity unveiling has not been solved yet. To tackle the issue, we propose C$^2$TU, a novel training-free and prompt-free method for Chinese cloaked toxic content unveiling. It first employs substring matching to identify candidate toxic words based on Chinese homo-graph and toxic lexicon. Then it filters those candidates that are non-toxic and corrects cloaks to be their corresponding toxicities. Specifically, we develop two model variants for filtering, which are based on BERT and LLMs, respectively. For LLMs, we address the auto-regressive limitation in computing word occurrence probability and utilize the full semantic contexts of a text sequence to reveal cloaked toxic words. Extensive experiments demonstrate that C$^2$TU can achieve superior performance on two Chinese toxic datasets. In particular, our method outperforms the best competitor by up to 71% on the F1 score and 35% on accuracy, respectively.

Problem

Research questions and friction points this paper is trying to address.

Detecting Chinese cloaked toxic content on social media

Overcoming homophone-based evasion of content moderation

Developing training-free method for Chinese toxicity unveiling

Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses homograph and toxic lexicon

Employs BERT and LLMs variants

Corrects cloaks to toxic words

🔎 Similar Papers

No similar papers found.