π€ AI Summary
This study addresses the challenge of modeling semantic shifts between literal and figurative interpretations of idioms in multilingual natural language processing, particularly under low-resource conditions and in authentic contexts where current systems perform poorly. The authors introduce MIDI, a multilingual idiom dataset spanning 18 languages across high-, medium-, and low-resource settings, uniquely incorporating both sentence-level and dialogue-level context and annotated by native speakers to enable joint modeling and evaluation of both idiom usages. Through contextual embedding analysis, hidden-layer intervention studies, and benchmarking with multilingual large language models, the research reveals a significant performance drop in idiom understanding for low-resource languages and consistently greater difficulty in recognizing literal meanings compared to figurative ones. Although dialogue context provides some improvement, it remains insufficient to bridge the resource gap, highlighting fundamental limitations in current modelsβ memory and reasoning capabilities.
π Abstract
Idiomatic expressions pose a major challenge for multilingual NLP because their meanings shift between figurative and literal usage, often requiring context for accurate interpretation. Prior work has focused on high-resource languages typically evaluates isolated idiom-meaning questions, overlooking realistic discourse. We introduce MIDI, a multilingual idiom dataset spanning 3 high-, 3 medium-, and 12 low-resource languages, curated by native speakers. Unlike previous datasets, MIDI provides idioms embedded in both sentence-level and conversational contexts, capturing both literal and figurative readings. Benchmarking state-of-the-art models shows that idiom comprehension degrades in low-resource languages and that, in all resource tiers, literal interpretations are substantially harder than figurative ones. Conversational context improves performance but does not eliminate these disparities. Through controlled tests and interventions on hidden representations, we further separate memorization from reasoning, exposing core limitations of current models.