Quantifying the Salience of Geo-Cultural Values for Pluralistic Safety Alignment

📅 2026-05-29

📈 Citations: 0

✨ Influential: 0

career value

188K/year

🤖 AI Summary

This study addresses the limited geographic and cultural diversity in existing AI safety evaluation datasets, which hinders the accurate reflection of cross-cultural value differences and compromises safety alignment in global deployments. Through a meta-analysis of mainstream safety datasets and integration of the Inglehart–Welzel cultural dimensions with multilevel modeling, the work systematically demonstrates—controlling for demographic variables—that cultural regions exert a significant and independent influence on safety judgments. The research identifies approximately 10% of safety evaluation items as culturally sensitive and proposes a large language model–based triage strategy: while such models cannot reliably replace human raters, they can effectively assist in prioritizing culturally sensitive content for human annotation.

📝 Abstract

Safe global deployment of AI models requires alignment with human values that vary across cultures. Yet rater pools in safety evaluation datasets remain largely geographically homogeneous, failing to capture geo-cultural differences. Further, it remains unclear whether such differences persist after controlling for demographics such as age, gender, and ethnicity. Through a meta-analysis of safety datasets, we find that most do not report geo-cultural information, and those that do lack a unified methodology to jointly analyze geo-cultural and demographic correlates. Using the Inglehart-Welzel dimensions of cross-cultural variation, we demonstrate via multilevel modeling that cultural zone membership explains variance in safety ratings beyond standard demographics (p<0.05 across 6 datasets). Moreover, our analysis indicates that roughly 10% of items in the datasets we examined are culturally sensitive: likely to be misclassified as safe without adequate cultural representation. We evaluate LLMs as both rater surrogates and triage tools, finding that current LLMs do not reliably stand in for raters, though they can help prioritize culturally sensitive items for human annotation. Our findings motivate more culturally pluralistic safety evaluation and offer practical takeaways to support it.

Problem

Research questions and friction points this paper is trying to address.

geo-cultural values

pluralistic safety alignment

cultural sensitivity

safety evaluation

cross-cultural variation

Innovation

Methods, ideas, or system contributions that make the work stand out.

geo-cultural values

pluralistic safety alignment

multilevel modeling