Light or Full Verb? A Minimal-Pair Dataset for Probing Phraseological Competence in Language Models

📅 2026-06-03

📈 Citations: 0

✨ Influential: 0

career value

138K/year

🤖 AI Summary

This study investigates whether language models can distinguish the semantic differences between light verb constructions (e.g., “make a decision”) and lexical verb usages (e.g., “make a cake”) in English. To this end, we introduce the first large-scale, controlled minimal-pair dataset that systematically contrasts minimally different sentences featuring the same verb in both light and lexical contexts. We complement this resource with probing experiments to analyze model representations. The dataset is designed for cross-lingual and multi-verb extensions, and both code and data are publicly released. Experimental results demonstrate that language models can effectively differentiate between the two verb types even in highly constrained contexts and develop separable semantic representations for distinct object types, thereby confirming their fine-grained capacity to capture verb semantic roles.

📝 Abstract

Frequent English verbs such as 'have' and 'make' can function either as collocates in light-verb constructions or as full lexical predicates, as in 'make a decision' vs. 'make a cake'. Whether language models represent this distinction remains unclear. We introduce a large-scale controlled dataset of minimally varying English sentence series in which the same context contains the same verb in light-verb and full-verb uses. Two probing experiments show that language models differentiate between these uses even in minimal contexts and exhibit separable patterns across object types. We release the dataset, generation code, and materials as a reusable resource. The framework supports extensions to broader contexts, additional verbs, and other languages.

Problem

Research questions and friction points this paper is trying to address.

light verb

full verb

phraseological competence

language models

lexical ambiguity

Innovation

Methods, ideas, or system contributions that make the work stand out.

light-verb constructions

minimal-pair dataset

probing