🤖 AI Summary
This study addresses the critical gap in safety evaluation resources for large language models (LLMs) in non-English languages, particularly German—a high-resource language—and Bulgarian—a low-resource language—where existing benchmarks inadequately capture risks related to harmful content generation within specific sociocultural and legal contexts. To bridge this gap, the authors introduce the first regionally grounded bilingual safety evaluation dataset covering both languages, constructed through carefully curated and human-crafted adversarial prompts spanning multiple culturally sensitive topics. Systematic evaluations using both multilingual and monolingual LLMs reveal significant cross-lingual disparities in safety behaviors. This work underscores the necessity of localized, region-specific benchmarks for the responsible deployment of LLMs and fills a crucial void in safety assessment for major non-English languages.
📝 Abstract
Large language models are increasingly deployed across professional domains, bringing hard-to-predict risks, including the generation of harmful or disrespectful content. Although substantial progress has been made in developing safety evaluation datasets, existing resources remain overwhelmingly English- and Chinese-centric. This limitation is particularly pronounced when evaluating languages that operate within shared sociocultural, legal, and ethical contexts. To address this gap, we introduce Schützen: a German--Bulgarian safety dataset designed to assess model answerability under risk, covering both a low-resource language (Bulgarian) and a high-resource language (German). Experiments with multilingual and language-specific LLMs reveal pronounced cross-language differences in safety behavior, highlighting the necessity of tailored, region-specific evaluation resources to support the responsible deployment of LLMs in Germany and Bulgaria. Datasets and code are available at https://github.com/xnlp-lab/Schutzen. Warning: this paper contains examples that may be offensive, harmful, or biased.