🤖 AI Summary
This study investigates whether pretrained language models (PLMs) implicitly encode cross-cultural value differences and examines the alignment of these implicit representations with established cultural theories (Hofstede, Schwartz) and empirical survey data. Method: We propose a linear, interpretable probing method grounded in word embedding spaces to construct cultural value dimension representations, leveraging multilingual BERT and XLM-R alongside statistical correlation analysis. Contribution/Results: We provide the first systematic evidence that mainstream PLMs possess latent capacity to distinguish cultural values; however, their implicit cultural value representations exhibit only weak correlation with real-world survey measurements (mean *r* < 0.3), substantially below theoretical expectations. This finding not only confirms the presence of cultural bias in PLMs but also introduces the novel concept of “model value alignment”—a critical challenge at the intersection of AI fairness and cross-cultural ethics. Our work establishes both a methodological foundation and empirical evidence for assessing and aligning cultural values in multilingual AI systems.
📝 Abstract
Language embeds information about social, cultural, and political values people hold. Prior work has explored potentially harmful social biases encoded in Pre-trained Language Models (PLMs). However, there has been no systematic study investigating how values embedded in these models vary across cultures.In this paper, we introduce probes to study which cross-cultural values are embedded in these models, and whether they align with existing theories and cross-cultural values surveys. We find that PLMs capture differences in values across cultures, but those only weakly align with established values surveys. We discuss implications of using mis-aligned models in cross-cultural settings, as well as ways of aligning PLMs with values surveys.