Toxicity Red-Teaming: Benchmarking LLM Safety in Singapore's Low-Resource Languages

📅 2025-09-18

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

This study addresses the inadequate safety safeguards of large language models (LLMs) in Singapore’s low-resource multilingual context—encompassing Singlish, Mandarin, Malay, and Tamil. To this end, we introduce SGToxicGuard, the first localized, culturally aware multilingual toxicity evaluation framework. Methodologically, we construct the SGToxicGuard dataset covering all four languages, design a cross-scenario red-teaming paradigm (spanning dialogue, question answering, and content generation), and integrate human–automated hybrid annotation with cross-lingual toxicity classification. Experiments reveal systematic failures of state-of-the-art multilingual LLMs in detecting and mitigating non-English toxic content. Our contributions include: (1) establishing the first safety evaluation framework tailored to low-resource multilingual settings; (2) releasing a reproducible adversarial benchmark; and (3) providing empirically grounded recommendations for improving robustness, thereby advancing safer and more inclusive multilingual AI.

Technology Category

Application Category

📝 Abstract

The advancement of Large Language Models (LLMs) has transformed natural language processing; however, their safety mechanisms remain under-explored in low-resource, multilingual settings. Here, we aim to bridge this gap. In particular, we introduce extsf{SGToxicGuard}, a novel dataset and evaluation framework for benchmarking LLM safety in Singapore's diverse linguistic context, including Singlish, Chinese, Malay, and Tamil. SGToxicGuard adopts a red-teaming approach to systematically probe LLM vulnerabilities in three real-world scenarios: extit{conversation}, extit{question-answering}, and extit{content composition}. We conduct extensive experiments with state-of-the-art multilingual LLMs, and the results uncover critical gaps in their safety guardrails. By offering actionable insights into cultural sensitivity and toxicity mitigation, we lay the foundation for safer and more inclusive AI systems in linguistically diverse environments.footnote{Link to the dataset: https://github.com/Social-AI-Studio/SGToxicGuard.} extcolor{red}{Disclaimer: This paper contains sensitive content that may be disturbing to some readers.}

Problem

Research questions and friction points this paper is trying to address.

Assessing LLM safety vulnerabilities in Singapore's multilingual context

Evaluating toxicity risks across conversation, QA, and content scenarios

Identifying safety gaps in low-resource language AI systems

Innovation

Methods, ideas, or system contributions that make the work stand out.

Red-teaming approach to probe LLM vulnerabilities

SGToxicGuard dataset for Singapore's multilingual context

Evaluation framework for cultural sensitivity and toxicity

🔎 Similar Papers

No similar papers found.

Authors to Follow