Exploring Multimodal Perception in Large Language Models Through Perceptual Strength Ratings

📅 2025-03-10
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study investigates the human-like capability of large language models (LLMs) in multimodal perceptual intensity modeling, benchmarking against human cross-sensory (e.g., visual–auditory–tactile) intensity ratings as ground truth. Method: We introduce the first perceptual-intensity-based quantitative benchmark and a contrastive evaluation framework integrating quantitative correlation analysis with qualitative error-pattern mining. Contribution/Results: GPT-4 and GPT-4o significantly outperform GPT-3.5 and GPT-4o-mini, yet GPT-4o does not surpass GPT-4—suggesting current multimodal fusion fails to enhance embodied perceptual grounding. Models consistently exhibit non-human reasoning patterns, including multisensory overestimation and reliance on superficial semantic associations. This work pioneers the integration of perceptual intensity into LLM multimodal evaluation, establishing a novel paradigm and reproducible benchmark for semantic grounding and embodied intelligence research.

Technology Category

Application Category

📝 Abstract
This study investigated the multimodal perception of large language models (LLMs), focusing on their ability to capture human-like perceptual strength ratings across sensory modalities. Utilizing perceptual strength ratings as a benchmark, the research compared GPT-3.5, GPT-4, GPT-4o, and GPT-4o-mini, highlighting the influence of multimodal inputs on grounding and linguistic reasoning. While GPT-4 and GPT-4o demonstrated strong alignment with human evaluations and significant advancements over smaller models, qualitative analyses revealed distinct differences in processing patterns, such as multisensory overrating and reliance on loose semantic associations. Despite integrating multimodal capabilities, GPT-4o did not exhibit superior grounding compared to GPT-4, raising questions about their role in improving human-like grounding. These findings underscore how LLMs' reliance on linguistic patterns can both approximate and diverge from human embodied cognition, revealing limitations in replicating sensory experiences.
Problem

Research questions and friction points this paper is trying to address.

Assessing LLMs' ability to mimic human perceptual strength ratings.
Comparing GPT models' performance in multimodal perception tasks.
Exploring limitations of LLMs in replicating human sensory experiences.
Innovation

Methods, ideas, or system contributions that make the work stand out.

Utilized perceptual strength ratings for evaluation
Compared GPT models across sensory modalities
Analyzed multisensory processing patterns in LLMs
🔎 Similar Papers
No similar papers found.
Jonghyun Lee
Jonghyun Lee
KRAFTON AI | PhD, SNU
AdaptationData-centric AIAI Agent
D
Dojun Park
Artificial Intelligence Institute, Seoul National University, Seoul, Republic of Korea
Jiwoo Lee
Jiwoo Lee
Staff Scientist of Lawrence Livermore National Laboratory
ClimateClimate modelingdiagnostic metricsbig-data visualizationnumerical weather prediction
H
Hoekeon Choi
Department of English Language and Literature, Seoul National University, Seoul, Republic of Korea; Brain and Humanities Lab, Seoul National University, Seoul, Republic of Korea
S
Sung-Eun Lee
Artificial Intelligence Institute, Seoul National University, Seoul, Republic of Korea; Department of English Language and Literature, Seoul National University, Seoul, Republic of Korea; Brain and Humanities Lab, Seoul National University, Seoul, Republic of Korea