🤖 AI Summary
This study reveals systematic geographic biases in large language models (LLMs) when generating brand and cultural recommendations, potentially undermining market fairness, competition, and informational diversity. To address this, we introduce ChoiceEval, a novel auditing framework that generates diverse user queries via psychological profiling, converts free-text model outputs into normalized top-k choice sets, and constructs comparable preference metrics across topics and user personas. This approach enables the first scalable and reproducible evaluation of LLMs’ brand and geographic preferences. Experiments across 10 domains and over 2,000 queries on Gemini, GPT, and DeepSeek demonstrate that U.S.-developed models exhibit strong preferences for American entities, while Chinese-developed models, though more balanced, still display detectable geographic biases—consistent across diverse user profiles.
📝 Abstract
Large language models (LLMs) based AI systems increasingly mediate what billions of people see, choose and buy. This creates an urgent need to quantify the systemic risks of LLM-driven market intermediation, including its implications for market fairness, competition, and the diversity of information exposure.
This paper introduces ChoiceEval, a reproducible framework for auditing preferences for brands and cultures in large language models (LLMs) under realistic usage conditions. ChoiceEval addresses two core technical challenges: (i) generating realistic, persona-diverse evaluation queries and (ii) converting free-form outputs into comparable choice sets and quantitative preference metrics. For a given topic (e.g. running shoes, hotel chains, travel destinations), the framework segments users into psychographic profiles (e.g., budget-conscious, wellness-focused, convenience), and then derives diverse prompts that reflect real-world advice-seeking and decision-making behaviour. LLM responses are converted into normalised top-k choice sets. Preference and geographic bias are then quantified using comparable metrics across topics and personas. Thus, ChoiceEval provides a scalable audit pipeline for researchers, platforms, and regulators, linking model behaviour to real-world economic outcomes.
Applied to Gemini, GPT, and DeepSeek across 10 topics spanning commerce and culture and more than 2,000 questions, ChoiceEval reveals consistent preferences: U.S.-developed models Gemini and GPT show marked favouritism toward American entities, while China-developed DeepSeek exhibits more balanced yet still detectable geographic preferences. These patterns persist across user personas, suggesting systematic rather than incidental effects.