Polar: A Benchmark for Evaluating Political Bias in LLMs

πŸ“… 2026-06-11
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
This study addresses the lack of reproducible, cross-lingual and cross-political-context evaluations of political bias in large language models (LLMs). It introduces Polar, a benchmark comprising 4,026 multiple-choice questions grounded in the Manifesto Project’s ideological dimensions and policy categories. Rather than relying on generative prompts, Polar employs option-level likelihood scoring to systematically evaluate 38 LLMs within both U.S. and South Korean political contexts. The work presents the first quantitative analysis of political bias across languages and political systems, revealing that models exhibit a general leftward skew on U.S.-centric issues but trend more neutral on Korean topics. Crucially, translation experiments demonstrate that merely altering the presentation language significantly shifts measured bias, underscoring language itself as a pivotal factor in bias assessment.
πŸ“ Abstract
Political bias in large language models (LLMs) is increasingly significant, but difficult to measure reproducibly across political and linguistic contexts. We introduce Polar, a 4,026-instance multiple-choice benchmark that measures political bias through option-level likelihoods rather than prompt-based generation. Polar covers two ideological axes and eight issue categories derived from the Manifesto Project, and evaluates models in parallel across U.S. and South Korean political contexts. Across 38 LLMs, measured bias varies systematically with political context, issue category, model group, and presentation language. All models lean left-progressive on U.S. political content, but show more centered and mixed patterns on South Korean content. Translation experiments further show that presentation language alone can shift measured bias. These findings highlight the need for multilingual and cross-contextual evaluation of political bias in LLMs.
Problem

Research questions and friction points this paper is trying to address.

political bias
large language models
multilingual evaluation
cross-contextual evaluation
benchmark
Innovation

Methods, ideas, or system contributions that make the work stand out.

political bias
multilingual evaluation
cross-contextual benchmark
likelihood-based measurement
large language models
πŸ”Ž Similar Papers
No similar papers found.