Language Models Can Resolve Reference Compositionally, But It's Not Their Native Strength: The Case of the Personal Relation Task

📅 2026-05-29

📈 Citations: 0

✨ Influential: 0

career value

168K/year

🤖 AI Summary

This study investigates whether large language models (LLMs) possess genuine compositional semantic understanding, with a focus on their performance disparity between extensional tasks (identifying referents) and intensional tasks (constructing structured semantic representations) in referential resolution. By designing formalized tasks grounded in personal relationships—such as *friend(parent(amber))*—the work presents the first systematic comparison between humans and state-of-the-art LLMs across these two task types. The findings reveal that LLMs outperform humans on intensional tasks but underperform on extensional ones, whereas humans exhibit the opposite pattern. This contrast suggests that while LLMs can manipulate compositional semantics formally, they struggle to achieve human-like referential understanding due to a lack of referential grounding. The results underscore the critical role of grounding in realizing true compositional semantic competence.

📝 Abstract

Do neural models, such as Large Language Models, genuinely acquire compositional abilities for interpretation of natural language? When we talk about semantic interpretation, we can distinguish two complementary aspects: establishing what an expression refers to in the world (which we call the Extensional task) and representing its sense in a structured way (which we call the Intensional task). We evaluate LLMs and humans on both tasks in the setting of the Personal Relation Task (Paperno 2022) in which, given a universe of people and their relationships with each other, one is asked to interpret a noun phrase such as "Amber's parent's friend". Here, for the Intensional task, the answer is the formula "friend(parent(amber))", and for the Extensional task, the person. We find that humans and LLMs show opposite strengths: humans perform better on Extensional than Intensional tasks, and LLMs vice versa. Our methodology brings greater nuance to the understanding of compositional abilities in modern machine learning models. Our results support the notion that the lack of referential grounding in LLM training is a crucial missing component in mimicking human-like language understanding.

Problem

Research questions and friction points this paper is trying to address.

compositionality

language models

referential grounding

semantic interpretation

personal relation task

Innovation

Methods, ideas, or system contributions that make the work stand out.

compositional semantics

referential grounding

Large Language Models