Human Preferences in Large Language Model Latent Space: A Technical Analysis on the Reliability of Synthetic Data in Voting Outcome Prediction

📅 2025-02-22

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

This study investigates whether synthetic data generated by large language models (LLMs) can reliably substitute for human survey data in vote prediction. We propose a latent-space probing method to disentangle political orientation and systematically evaluate 14 mainstream LLMs across three dimensions: demographic subgroup opinion distributions, personality–partisanship mappings, and prompt robustness. Results reveal that LLMs substantially underestimate inter-individual opinion variance observed in real populations, exhibit poor discriminability in political stance attribution, and produce outputs highly sensitive to prompt phrasing—collectively undermining predictive stability and construct validity relative to empirical survey data. To our knowledge, this is the first cross-model empirical study to identify and characterize systematic opinion distortion mechanisms inherent in LLMs. Our findings establish critical methodological boundaries for public opinion modeling, social science experimentation, and AI-augmented survey research.

Technology Category

Application Category

📝 Abstract

Generative AI (GenAI) is increasingly used in survey contexts to simulate human preferences. While many research endeavors evaluate the quality of synthetic GenAI data by comparing model-generated responses to gold-standard survey results, fundamental questions about the validity and reliability of using LLMs as substitutes for human respondents remain. Our study provides a technical analysis of how demographic attributes and prompt variations influence latent opinion mappings in large language models (LLMs) and evaluates their suitability for survey-based predictions. Using 14 different models, we find that LLM-generated data fails to replicate the variance observed in real-world human responses, particularly across demographic subgroups. In the political space, persona-to-party mappings exhibit limited differentiation, resulting in synthetic data that lacks the nuanced distribution of opinions found in survey data. Moreover, we show that prompt sensitivity can significantly alter outputs for some models, further undermining the stability and predictiveness of LLM-based simulations. As a key contribution, we adapt a probe-based methodology that reveals how LLMs encode political affiliations in their latent space, exposing the systematic distortions introduced by these models. Our findings highlight critical limitations in AI-generated survey data, urging caution in its use for public opinion research, social science experimentation, and computational behavioral modeling.

Problem

Research questions and friction points this paper is trying to address.

Assessing LLM reliability for human preference simulation.

Analyzing demographic and prompt effects on LLM opinions.

Evaluating AI-generated survey data's predictive accuracy.

Innovation

Methods, ideas, or system contributions that make the work stand out.

Probe-based methodology analysis

Latent space encoding exploration

Prompt sensitivity assessment

🔎 Similar Papers

No similar papers found.

Authors to Follow