๐ค AI Summary
This study investigates whether large language models (LLMs) possess genuine compositional semantic understanding, with a focus on their performance disparity between extensional tasks (identifying referents) and intensional tasks (constructing structured semantic representations) in referential resolution. By designing formalized tasks grounded in personal relationshipsโsuch as *friend(parent(amber))*โthe work presents the first systematic comparison between humans and state-of-the-art LLMs across these two task types. The findings reveal that LLMs outperform humans on intensional tasks but underperform on extensional ones, whereas humans exhibit the opposite pattern. This contrast suggests that while LLMs can manipulate compositional semantics formally, they struggle to achieve human-like referential understanding due to a lack of referential grounding. The results underscore the critical role of grounding in realizing true compositional semantic competence.
๐ Abstract
Do neural models, such as Large Language Models, genuinely acquire compositional abilities for interpretation of natural language? When we talk about semantic interpretation, we can distinguish two complementary aspects: establishing what an expression refers to in the world (which we call the Extensional task) and representing its sense in a structured way (which we call the Intensional task). We evaluate LLMs and humans on both tasks in the setting of the Personal Relation Task (Paperno 2022) in which, given a universe of people and their relationships with each other, one is asked to interpret a noun phrase such as "Amber's parent's friend". Here, for the Intensional task, the answer is the formula "friend(parent(amber))", and for the Extensional task, the person. We find that humans and LLMs show opposite strengths: humans perform better on Extensional than Intensional tasks, and LLMs vice versa. Our methodology brings greater nuance to the understanding of compositional abilities in modern machine learning models. Our results support the notion that the lack of referential grounding in LLM training is a crucial missing component in mimicking human-like language understanding.