Light or Full Verb? A Minimal-Pair Dataset for Probing Phraseological Competence in Language Models

📅 2026-06-03
📈 Citations: 0
Influential: 0
📄 PDF

career value

138K/year
🤖 AI Summary
This study investigates whether language models can distinguish the semantic differences between light verb constructions (e.g., “make a decision”) and lexical verb usages (e.g., “make a cake”) in English. To this end, we introduce the first large-scale, controlled minimal-pair dataset that systematically contrasts minimally different sentences featuring the same verb in both light and lexical contexts. We complement this resource with probing experiments to analyze model representations. The dataset is designed for cross-lingual and multi-verb extensions, and both code and data are publicly released. Experimental results demonstrate that language models can effectively differentiate between the two verb types even in highly constrained contexts and develop separable semantic representations for distinct object types, thereby confirming their fine-grained capacity to capture verb semantic roles.
📝 Abstract
Frequent English verbs such as 'have' and 'make' can function either as collocates in light-verb constructions or as full lexical predicates, as in 'make a decision' vs. 'make a cake'. Whether language models represent this distinction remains unclear. We introduce a large-scale controlled dataset of minimally varying English sentence series in which the same context contains the same verb in light-verb and full-verb uses. Two probing experiments show that language models differentiate between these uses even in minimal contexts and exhibit separable patterns across object types. We release the dataset, generation code, and materials as a reusable resource. The framework supports extensions to broader contexts, additional verbs, and other languages.
Problem

Research questions and friction points this paper is trying to address.

light verb
full verb
phraseological competence
language models
lexical ambiguity
Innovation

Methods, ideas, or system contributions that make the work stand out.

light-verb constructions
minimal-pair dataset
probing
phraseological competence
language models
🔎 Similar Papers
No similar papers found.