Color Names in Vision-Language Models

📅 2025-09-26
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study systematically evaluates the color-naming capability of vision-language models (VLMs) in multilingual settings to assess their human-like color semantic understanding. Using 957 canonical color samples, we replicate cross-lingual color-naming experiments across five state-of-the-art VLMs, augmented with ablation studies and generative behavioral modeling. We identify— for the first time—21 cross-model stable color terms; discover two dominant naming strategies (“basic color terms” and “lightness modifiers”); and demonstrate that language architecture exerts an effect on color naming independent of visual encoding. Results show high accuracy on prototypical colors but sharp degradation on non-prototypical ones; severe training data imbalance across the nine tested languages; and hue as the predominant visual cue governing naming decisions. This work establishes the first systematic empirical benchmark for evaluating semantic groundedness in VLMs.

Technology Category

Application Category

📝 Abstract
Color serves as a fundamental dimension of human visual perception and a primary means of communicating about objects and scenes. As vision-language models (VLMs) become increasingly prevalent, understanding whether they name colors like humans is crucial for effective human-AI interaction. We present the first systematic evaluation of color naming capabilities across VLMs, replicating classic color naming methodologies using 957 color samples across five representative models. Our results show that while VLMs achieve high accuracy on prototypical colors from classical studies, performance drops significantly on expanded, non-prototypical color sets. We identify 21 common color terms that consistently emerge across all models, revealing two distinct approaches: constrained models using predominantly basic terms versus expansive models employing systematic lightness modifiers. Cross-linguistic analysis across nine languages demonstrates severe training imbalances favoring English and Chinese, with hue serving as the primary driver of color naming decisions. Finally, ablation studies reveal that language model architecture significantly influences color naming independent of visual processing capabilities.
Problem

Research questions and friction points this paper is trying to address.

Evaluating color naming capabilities in vision-language models
Assessing performance differences between prototypical and non-prototypical colors
Identifying training imbalances and architectural influences on color naming
Innovation

Methods, ideas, or system contributions that make the work stand out.

Evaluated color naming across five vision-language models
Identified 21 consistent color terms with distinct approaches
Analyzed cross-linguistic training imbalances and architectural influences