🤖 AI Summary
This study addresses fairness disparities of large language models (LLMs) in cross-cultural and cross-demographic social survey tasks. To overcome the lack of sociodemographic alignment in existing evaluation frameworks, we propose the first quantitative LLM fairness framework tailored to sensitive attributes—including age, gender, education level, religion, political identity, and race. Leveraging publicly available, cross-national survey data from Chile and the United States, we systematically assess model performance via prediction accuracy, inter-group error disparities, and stratified bias analysis. Results reveal significantly higher accuracy on U.S. samples than on Chilean ones; moreover, bias patterns are region-specific: political identity and race dominate in the U.S., whereas gender, education, and religion exhibit stronger biases in Chile. The framework uncovers systemic fairness risks arising from geographic concentration in training data, offering a reproducible diagnostic tool for evaluating LLM fairness in global deployment.
📝 Abstract
Large Language Models (LLMs) excel in text generation and understanding, especially in simulating socio-political and economic patterns, serving as an alternative to traditional surveys. However, their global applicability remains questionable due to unexplored biases across socio-demographic and geographic contexts. This study examines how LLMs perform across diverse populations by analyzing public surveys from Chile and the United States, focusing on predictive accuracy and fairness metrics. The results show performance disparities, with LLM consistently outperforming on U.S. datasets. This bias originates from the U.S.-centric training data, remaining evident after accounting for socio-demographic differences. In the U.S., political identity and race significantly influence prediction accuracy, while in Chile, gender, education, and religious affiliation play more pronounced roles. Our study presents a novel framework for measuring socio-demographic biases in LLMs, offering a path toward ensuring fairer and more equitable model performance across diverse socio-cultural contexts.