A Benchmark for Zero-Shot Belief Inference in Large Language Models

📅 2025-11-23

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

This work investigates the capacity and generalization boundaries of large language models (LLMs) to infer individuals’ multi-domain belief stances in zero-shot settings. We construct the first reproducible, cross-domain zero-shot evaluation benchmark, curated from online debate platform data. To isolate demographic background effects from prior beliefs, we propose a controlled experimental framework for assessing LLMs’ reasoning performance on non-political social cognition tasks. Methodologically, we employ zero-shot prompting, multivariate input conditioning, and LLM-based inference. Experiments reveal that incorporating individual demographic context improves stance prediction accuracy—but the gain is highly domain-dependent, exposing significant domain-specific limitations in current LLMs’ human belief modeling. Our core contribution is the establishment of the first cross-domain zero-shot belief inference evaluation paradigm, empirically characterizing its capabilities and fundamental constraints.

Technology Category

Application Category

📝 Abstract

Beliefs are central to how humans reason, communicate, and form social connections, yet most computational approaches to studying them remain confined to narrow sociopolitical contexts and rely on fine-tuning for optimal performance. Despite the growing use of large language models (LLMs) across disciplines, how well these systems generalize across diverse belief domains remains unclear. We introduce a systematic, reproducible benchmark that evaluates the ability of LLMs to predict individuals' stances on a wide range of topics in a zero-shot setting using data from an online debate platform. The benchmark includes multiple informational conditions that isolate the contribution of demographic context and known prior beliefs to predictive success. Across several small- to medium-sized models, we find that providing more background information about an individual improves predictive accuracy, but performance varies substantially across belief domains. These findings reveal both the capacity and limitations of current LLMs to emulate human reasoning, advancing the study of machine behavior and offering a scalable framework for modeling belief systems beyond the sociopolitical sphere.

Problem

Research questions and friction points this paper is trying to address.

Evaluating LLMs' zero-shot belief inference across diverse domains

Assessing generalization beyond narrow sociopolitical contexts

Measuring predictive accuracy with demographic and prior belief information

Innovation

Methods, ideas, or system contributions that make the work stand out.

Zero-shot benchmark for belief inference

Uses online debate platform data

Tests demographic and prior belief conditions

🔎 Similar Papers

ZeroDL: Zero-shot Distribution Learning for Text Clustering via Large Language Models

2024-06-19arXiv.orgCitations: 0

Authors to Follow