How Well Do LLMs Predict Human Behavior? A Measure of their Pretrained Knowledge

๐Ÿ“… 2026-01-18
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
This study addresses the absence of a unified framework for quantifying the extent to which large language models (LLMs) rely on pretraining knowledge when predicting human behavior. To this end, it introduces โ€œequivalent sample sizeโ€ as a novel metric that estimates the amount of task-specific data required for an LLM to achieve its observed predictive accuracy. The authors develop an asymptotic statistical inference framework by integrating flexible machine learning techniques, cross-validation, and comparative analysis of prediction errors. Empirical validation on dynamic panel data of household income reveals that LLMs encode substantial predictive information for certain economic variables but limited utility for others, demonstrating that their value as a substitute for domain-specific data is highly context-dependent.

Technology Category

Application Category

๐Ÿ“ Abstract
Large language models (LLMs) are increasingly used to predict human behavior. We propose a measure for evaluating how much knowledge a pretrained LLM brings to such a prediction: its equivalent sample size, defined as the amount of task-specific data needed to match the predictive accuracy of the LLM. We estimate this measure by comparing the prediction error of a fixed LLM in a given domain to that of flexible machine learning models trained on increasing samples of domain-specific data. We further provide a statistical inference procedure by developing a new asymptotic theory for cross-validated prediction error. Finally, we apply this method to the Panel Study of Income Dynamics. We find that LLMs encode considerable predictive information for some economic variables but much less for others, suggesting that their value as substitutes for domain-specific data differs markedly across settings.
Problem

Research questions and friction points this paper is trying to address.

large language models
human behavior prediction
pretrained knowledge
equivalent sample size
predictive accuracy
Innovation

Methods, ideas, or system contributions that make the work stand out.

equivalent sample size
large language models
predictive accuracy
cross-validated prediction error
pretrained knowledge
๐Ÿ”Ž Similar Papers
No similar papers found.
W
Wayne Gao
Department of Economics, University of Pennsylvania
S
Sukjin Han
School of Economics, University of Bristol
Annie Liang
Annie Liang
Northwestern University
EconomicsComputer Science