Modeling semantic association in self-paced reading with language model embeddings

📅 2026-06-05

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

This study addresses the challenge of quantifying semantic relatedness between words and their context while isolating its unique contribution to reading comprehension beyond mere lexical predictability. Leveraging Dutch EEG and self-paced reading data, the authors systematically compare multiple language model embeddings—including sentence-level representations—across varying context lengths. Using Bayesian hierarchical modeling combined with factor analysis, they jointly assess how these embeddings influence both N400 neural responses and reading times. The findings reveal, for the first time, that the choice of embedding method substantially affects estimates of semantic relatedness: sentence embeddings consistently capture semantic effects independent of predictability, demonstrating superior performance at both neural and behavioral levels.

📝 Abstract

Semantic association between a word and its context has been identified as an important component of reading comprehension, even when word predictability is accounted for. Recent research has highlighted the potential of language model ( LM) embeddings to quantify semantic association. Yet, embedding-based semantic association have been operationalized in a myriad of ways. In this study, we use embeddings from LMs to estimate semantic association on a corpus of joint electroencephalography (EEG) and self-paced reading of natural, Dutch texts. Semantic association is calculated in ten different implementations that vary the embedding model and context lengths. The effects of semantic association across the different implementations on the N400 and self-paced reading times are examined using Bayesian hierarchical models and Bayes factor. The results show that the choice of embedding model can alter the estimated effect of semantic association on both the N400 and self-paced reading times. Furthermore, the results demonstrate a promising potential of sentence embeddings for capturing semantic association, as only implementations relying on sentence embeddings indicate reliable results of semantic association beyond word predictability on both neural and behavioral measures. Together, these findings highlight the importance of methodological choices in quantifying semantic association.

Problem

Research questions and friction points this paper is trying to address.

semantic association

language model embeddings

self-paced reading

N400

reading comprehension

Innovation

Methods, ideas, or system contributions that make the work stand out.

semantic association

language model embeddings

sentence embeddings