To Words and Beyond: Probing Large Language Models for Sentence-Level Psycholinguistic Norms of Memorability and Reading Times

📅 2026-03-12
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study presents the first systematic investigation into the capacity of large language models (LLMs) to predict psycholinguistic metrics—such as memorability and reading time—at the sentence level, which depend on complex intra-sentential multi-word interactions and have previously lacked LLM-based exploration. Leveraging human-annotated memorability ratings and eye-tracking reading time data, we evaluate LLMs using zero-shot prompting, few-shot prompting, and supervised fine-tuning. Results demonstrate that fine-tuned LLMs significantly outperform interpretable baselines, achieving predictions highly correlated with human norms. In contrast, zero-shot and few-shot approaches exhibit inconsistent performance, challenging the assumption that prompting alone suffices to effectively simulate human cognitive processing. These findings illuminate both the promise and limitations of LLMs as cognitive proxies in modeling human language comprehension.

Technology Category

Application Category

📝 Abstract
Large Language Models (LLMs) have recently been shown to produce estimates of psycholinguistic norms, such as valence, arousal, or concreteness, for words and multiword expressions, that correlate with human judgments. These estimates are obtained by prompting an LLM, in zero-shot fashion, with a question similar to those used in human studies. Meanwhile, for other norms such as lexical decision time or age of acquisition, LLMs require supervised fine-tuning to obtain results that align with ground-truth values. In this paper, we extend this approach to the previously unstudied features of sentence memorability and reading times, which involve the relationship between multiple words in a sentence-level context. Our results show that via fine-tuning, models can provide estimates that correlate with human-derived norms and exceed the predictive power of interpretable baseline predictors, demonstrating that LLMs contain useful information about sentence-level features. At the same time, our results show very mixed zero-shot and few-shot performance, providing further evidence that care is needed when using LLM-prompting as a proxy for human cognitive measures.
Problem

Research questions and friction points this paper is trying to address.

sentence memorability
reading times
psycholinguistic norms
large language models
sentence-level features
Innovation

Methods, ideas, or system contributions that make the work stand out.

sentence-level psycholinguistics
large language models
memorability prediction
reading time estimation
supervised fine-tuning
🔎 Similar Papers
No similar papers found.
T
Thomas Hikaru Clark
Massachusetts Institute of Technology
C
Carlos Arriaga
Universidad Politécnica de Madrid
J
Javier Conde
Universidad Politécnica de Madrid
Gonzalo Martínez
Gonzalo Martínez
Universidad Carlos III de Madrid
P
Pedro Reviriego
Universidad Politécnica de Madrid