🤖 AI Summary
To address the challenge of evasive toxic content—such as insults and discriminatory language—disguised via homophonic character substitution on Chinese social media platforms, this paper proposes C²TU, a training-free and prompt-free method. C²TU constructs a Chinese homophone graph and performs substring matching and candidate filtering using a curated toxicity lexicon, followed by BERT-based semantic filtering and large language model (LLM)-driven full-sequence contextual modeling to restore disguised tokens and regularize corrections. It is the first work to systematically tackle homophonic toxic content detection in Chinese; introduces the novel “training- and prompt-free” paradigm; and overcomes the autoregressive limitation of LLMs by enabling end-to-end non-autoregressive toxicity identification. Evaluated on two Chinese toxic content datasets, C²TU achieves 71% higher F1-score and 35% higher accuracy than the state-of-the-art methods.
📝 Abstract
Social media platforms have experienced a significant rise in toxic content, including abusive language and discriminatory remarks, presenting growing challenges for content moderation. Some users evade censorship by deliberately disguising toxic words through homophonic cloak, which necessitates the task of unveiling cloaked toxicity. Existing methods are mostly designed for English texts, while Chinese cloaked toxicity unveiling has not been solved yet. To tackle the issue, we propose C$^2$TU, a novel training-free and prompt-free method for Chinese cloaked toxic content unveiling. It first employs substring matching to identify candidate toxic words based on Chinese homo-graph and toxic lexicon. Then it filters those candidates that are non-toxic and corrects cloaks to be their corresponding toxicities. Specifically, we develop two model variants for filtering, which are based on BERT and LLMs, respectively. For LLMs, we address the auto-regressive limitation in computing word occurrence probability and utilize the full semantic contexts of a text sequence to reveal cloaked toxic words. Extensive experiments demonstrate that C$^2$TU can achieve superior performance on two Chinese toxic datasets. In particular, our method outperforms the best competitor by up to 71% on the F1 score and 35% on accuracy, respectively.