Rethinking the Idiomaticity Decomposability Hypothesis: Evidence from Distributional Learning

📅 2026-06-02

📈 Citations: 0

✨ Influential: 0

career value

167K/year

🤖 AI Summary

This study investigates whether idiom decomposability genuinely determines syntactic flexibility or whether distributional experience—such as familiarity and predictability—plays a more dominant role. For the first time, we leverage contextual language models as controlled distributional learners to quantify decomposability from their internal representations, integrating human judgments, measures of syntactic flexibility, and surprisal in a systematic analysis, while also tracking the dynamic evolution of idiom representations throughout pretraining. Results reveal that model-derived decomposability correlates only weakly with human judgments and exhibits a slight negative association with syntactic flexibility. Moreover, the stabilization of idiom representations is jointly influenced by frequency, surprisal, and decomposability, with decomposability demonstrating the strongest training-dependent effect—thereby challenging the traditional decomposability hypothesis.

📝 Abstract

Idioms can be analysed in terms of their decomposability, the extent to which constituent meanings contribute to the figurative whole. Decomposability is thought to predict syntactic flexibility. Usage-based accounts instead attribute idiom behaviour to distributional experience, such as speaker familiarity and predictability. We examine these views using contextualised language models as controlled distributional learners. We propose a model-internal measure of decomposability and relate it to human ratings, syntactic flexibility, and predictability while tracking idiom learning during pretraining. Model-derived decomposability correlates weakly with human judgments and shows a small but consistent negative relationship with syntactic flexibility. Pretraining analyses show that stabilisation of idiom representations in models is not explained by frequency alone. Instead, surprisal, decomposability, and frequency all contribute, with decomposability showing the strongest training-dependent effect.

Problem

Research questions and friction points this paper is trying to address.

idiomaticity

decomposability

syntactic flexibility

distributional learning

predictability

Innovation

Methods, ideas, or system contributions that make the work stand out.

decomposability

distributional learning

contextualised language models