Established Psychometric vs. Ecologically Valid Questionnaires: Rethinking Psychological Assessments in Large Language Models

📅 2025-09-12
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study challenges the ecological validity of directly applying human psychometric instruments—such as the Big Five Inventory (BFI) and Portrait Values Questionnaire (PVQ)—to assess personality and values in large language models (LLMs), arguing that static questionnaire items lack contextual grounding in authentic interactive scenarios. Using a comparative design, we evaluate traditional versus ecologically enhanced questionnaires through response behavior analysis, reliability and validity testing, and situated simulation tasks. Results reveal systematic measurement distortions with conventional instruments: inflated personality profiles, poor cross-prompt response stability, and attenuated item sensitivity. Crucially, this work provides the first empirical demonstration of how standard psychometric tools induce construct-irrelevant variance in LLM assessment. Our primary contribution is proposing a task-embedded, interaction-driven evaluation paradigm to replace static self-report formats—establishing a methodological foundation and practical framework for modeling psychological attributes in AI systems. (149 words)

Technology Category

Application Category

📝 Abstract
Researchers have applied established psychometric questionnaires (e.g., BFI, PVQ) to measure the personality traits and values reflected in the responses of Large Language Models (LLMs). However, concerns have been raised about applying these human-designed questionnaires to LLMs. One such concern is their lack of ecological validity--the extent to which survey questions adequately reflect and resemble real-world contexts in which LLMs generate texts in response to user queries. However, it remains unclear how established questionnaires and ecologically valid questionnaires differ in their outcomes, and what insights these differences may provide. In this paper, we conduct a comprehensive comparative analysis of the two types of questionnaires. Our analysis reveals that established questionnaires (1) yield substantially different profiles of LLMs from ecologically valid ones, deviating from the psychological characteristics expressed in the context of user queries, (2) suffer from insufficient items for stable measurement, (3) create misleading impressions that LLMs possess stable constructs, and (4) yield exaggerated profiles for persona-prompted LLMs. Overall, our work cautions against the use of established psychological questionnaires for LLMs. Our code will be released upon publication.
Problem

Research questions and friction points this paper is trying to address.

Comparing psychometric and ecologically valid questionnaires for LLM assessment
Evaluating ecological validity limitations in human-designed psychological questionnaires
Analyzing measurement stability issues in traditional personality assessments for LLMs
Innovation

Methods, ideas, or system contributions that make the work stand out.

Comparative analysis of psychometric and ecological questionnaires
Revealing differences in LLM psychological assessment outcomes
Cautioning against established questionnaires for LLMs
🔎 Similar Papers
No similar papers found.