🤖 AI Summary
This study investigates whether large language models (LLMs) internalize and overtly express subjective preferences, opinions, and beliefs (POBs) across social, cultural, ethical, and personal dimensions—and how such expression compromises neutrality, reliability, and consistency. To this end, we construct the first multi-dimensional POB benchmark, integrating human-crafted, cross-domain question sets with chain-of-thought reasoning, self-reflection, and multi-round sampling consistency analysis. Experimental results reveal a pronounced degradation in neutrality and consistency across mainstream LLMs, with newer model versions exhibiting exacerbated biases. Test-time computation enhancements—such as CoT and introspection—yield only marginal improvements, exposing fundamental limitations of current alignment techniques at the values level. This work provides the first systematic quantification of LLMs’ subjective倾向ness, establishing a novel paradigm and empirical foundation for value alignment evaluation.
📝 Abstract
As Large Language Models (LLMs) become deeply integrated into human life and increasingly influence decision-making, it's crucial to evaluate whether and to what extent they exhibit subjective preferences, opinions, and beliefs. These tendencies may stem from biases within the models, which may shape their behavior, influence the advice and recommendations they offer to users, and potentially reinforce certain viewpoints. This paper presents the Preference, Opinion, and Belief survey (POBs), a benchmark developed to assess LLMs' subjective inclinations across societal, cultural, ethical, and personal domains. We applied our benchmark to evaluate leading open- and closed-source LLMs, measuring desired properties such as reliability, neutrality, and consistency. In addition, we investigated the effect of increasing the test-time compute, through reasoning and self-reflection mechanisms, on those metrics. While effective in other tasks, our results show that these mechanisms offer only limited gains in our domain. Furthermore, we reveal that newer model versions are becoming less consistent and more biased toward specific viewpoints, highlighting a blind spot and a concerning trend. POBS: https://ibm.github.io/POBS