Uncovering Visual-Semantic Psycholinguistic Properties from the Distributional Structure of Text Embedding Spac

📅 2025-05-29

📈 Citations: 0

✨ Influential: 0

career value

211K/year

🤖 AI Summary

This paper addresses the problem of accurately estimating the psycholinguistic attributes of *imageability* and *concreteness* for words using text-only input—without images or human annotations. We propose Neighborhood Stability Measure (NSM), an unsupervised method that, for the first time, directly models vision–semantics alignment by quantifying the kurtosis of local neighborhood distributions in pretrained contextual embeddings (e.g., BERT, RoBERTa). NSM operates entirely in the textual embedding space, requiring no visual modality or labeled data, and relies solely on intrinsic geometric properties of local embedding neighborhoods. Evaluated across multiple benchmark datasets, NSM significantly outperforms existing unsupervised approaches: it achieves higher correlation with human ratings (average improvement of 12.3%) and attains 86.7% accuracy in binary classification. These results demonstrate that structural regularities in pretrained text embeddings inherently encode generalizable psycholinguistic signals.

Technology Category

Application Category

📝 Abstract

Imageability (potential of text to evoke a mental image) and concreteness (perceptibility of text) are two psycholinguistic properties that link visual and semantic spaces. It is little surprise that computational methods that estimate them do so using parallel visual and semantic spaces, such as collections of image-caption pairs or multi-modal models. In this paper, we work on the supposition that text itself in an image-caption dataset offers sufficient signals to accurately estimate these properties. We hypothesize, in particular, that the peakedness of the neighborhood of a word in the semantic embedding space reflects its degree of imageability and concreteness. We then propose an unsupervised, distribution-free measure, which we call Neighborhood Stability Measure (NSM), that quantifies the sharpness of peaks. Extensive experiments show that NSM correlates more strongly with ground-truth ratings than existing unsupervised methods, and is a strong predictor of these properties for classification. Our code and data are available on GitHub (https://github.com/Artificial-Memory-Lab/imageability).

Problem

Research questions and friction points this paper is trying to address.

Estimating imageability and concreteness from text embeddings

Linking visual-semantic properties to embedding space structure

Proposing unsupervised measure for psycholinguistic property prediction

Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses text embedding space distribution

Proposes Neighborhood Stability Measure (NSM)

Estimates imageability and concreteness unsupervisedly

🔎 Similar Papers

UniEmoX: Cross-modal Semantic-Guided Large-Scale Pretraining for Universal Scene Emotion Perception