KOTOX: A Korean Toxic Dataset for Deobfuscation and Detoxification

📅 2025-10-12

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

To address the insufficient research on detecting and purifying obfuscated harmful content in low-resource languages—particularly Korean—this paper systematically characterizes Korean-specific obfuscation strategies for the first time. Leveraging authentic corpora, we induce linguistically grounded, adversarial obfuscation rules and construct the first multilevel Korean dataset supporting both de-obfuscation and detoxification (with easy, medium, and hard obfuscation levels). Generated via a rule-driven approach, the dataset explicitly incorporates Korean morphological and orthographic features while preserving adversarial robustness, thereby significantly enhancing large language models’ capability to identify and sanitize obfuscated toxic Korean text. Our key contributions are: (1) the first taxonomy of Korean obfuscation techniques with formalized transformation rules; (2) the first multilevel Korean benchmark dataset designed for the joint de-obfuscation and detoxification task; and (3) a reusable data curation methodology and technical paradigm for advancing safety-aware LLM applications in low-resource languages.

Technology Category

Application Category

📝 Abstract

Toxic content has become an increasingly critical social issue with the rapid expansion of online communication. While numerous studies explored methods for detecting and detoxifying such content, most have focused primarily on English, leaving low-resource language underrepresented. Consequently, Large Language Models~(LLMs) often struggle to identify and neutralize toxic expressions in these languages. This challenge becomes even more pronounced when user employ obfuscation techniques to evade detection systems. Therefore, we propose a extbf{KOTOX: Korean Toxic Dataset} for deobfuscation and detoxicification to address this issue. We categorize various obfuscation approaches based on linguistic characteristics of Korean and define a set of transformation rules grounded in real-word examples. Using these rules, we construct three dataset versions (easy, normal, and hard) representing different levels of obfuscation difficulty. This is the first dataset that simultaneously supports deobfuscation and detoxification for the Korean language. We expect it to facilitate better understanding and mitigating of obfuscated toxic content in LLM for low-resource languages. Our code and data are available at https://github.com/leeyejin1231/KOTOX.

Problem

Research questions and friction points this paper is trying to address.

Addressing toxic content detection gaps in low-resource Korean language

Developing deobfuscation methods for intentionally hidden toxic expressions

Creating multi-level difficulty datasets for Korean detoxification systems

Innovation

Methods, ideas, or system contributions that make the work stand out.

Korean toxic dataset for deobfuscation and detoxification

Categorize obfuscation approaches using Korean linguistic characteristics

Construct three dataset versions with different obfuscation difficulty levels

🔎 Similar Papers

No similar papers found.

Authors to Follow