Language Models Can Resolve Reference Compositionally, But It's Not Their Native Strength: The Case of the Personal Relation Task

๐Ÿ“… 2026-05-29
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF

career value

168K/year
๐Ÿค– AI Summary
This study investigates whether large language models (LLMs) possess genuine compositional semantic understanding, with a focus on their performance disparity between extensional tasks (identifying referents) and intensional tasks (constructing structured semantic representations) in referential resolution. By designing formalized tasks grounded in personal relationshipsโ€”such as *friend(parent(amber))*โ€”the work presents the first systematic comparison between humans and state-of-the-art LLMs across these two task types. The findings reveal that LLMs outperform humans on intensional tasks but underperform on extensional ones, whereas humans exhibit the opposite pattern. This contrast suggests that while LLMs can manipulate compositional semantics formally, they struggle to achieve human-like referential understanding due to a lack of referential grounding. The results underscore the critical role of grounding in realizing true compositional semantic competence.
๐Ÿ“ Abstract
Do neural models, such as Large Language Models, genuinely acquire compositional abilities for interpretation of natural language? When we talk about semantic interpretation, we can distinguish two complementary aspects: establishing what an expression refers to in the world (which we call the Extensional task) and representing its sense in a structured way (which we call the Intensional task). We evaluate LLMs and humans on both tasks in the setting of the Personal Relation Task (Paperno 2022) in which, given a universe of people and their relationships with each other, one is asked to interpret a noun phrase such as "Amber's parent's friend". Here, for the Intensional task, the answer is the formula "friend(parent(amber))", and for the Extensional task, the person. We find that humans and LLMs show opposite strengths: humans perform better on Extensional than Intensional tasks, and LLMs vice versa. Our methodology brings greater nuance to the understanding of compositional abilities in modern machine learning models. Our results support the notion that the lack of referential grounding in LLM training is a crucial missing component in mimicking human-like language understanding.
Problem

Research questions and friction points this paper is trying to address.

compositionality
language models
referential grounding
semantic interpretation
personal relation task
Innovation

Methods, ideas, or system contributions that make the work stand out.

compositional semantics
referential grounding
Large Language Models
extensional vs intensional tasks
Personal Relation Task
๐Ÿ”Ž Similar Papers
No similar papers found.