When Meaning Isn't Literal: Exploring Idiomatic Meaning Across Languages and Modalities

๐Ÿ“… 2026-04-12
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
Current large language models often struggle with culturally and metaphorically dense idioms, frequently misinterpreting them through literal semantics while overlooking their intended meanings. To address this limitation, this work introduces Mediom, the first high-quality multimodal idiom corpus covering Hindi, Bengali, and Thai, alongside HIDEโ€”a novel prompt-based framework for idiom interpretation. HIDE integrates multilingual large language models with vision-language models and employs iterative refinement through error-feedback retrieval and diagnostic prompting to enhance non-literal reasoning. Experimental results demonstrate that HIDE substantially mitigates systematic deficiencies in cross-cultural idiom comprehension exhibited by existing models when evaluated on the Mediom benchmark.

Technology Category

Application Category

๐Ÿ“ Abstract
Idiomatic reasoning, deeply intertwined with metaphor and culture, remains a blind spot for contemporary language models, whose progress skews toward surface-level lexical and semantic cues. For instance, the Bengali idiom \textit{\foreignlanguage{bengali}{\char"0986\char"0999\char"09CD\char"0997\char"09C1 \char"09B0 \char"09AB\char"09B2 \char"099F\char"0995}} (angur fol tok, ``grapes are sour''): it encodes denial-driven rationalization, yet naive models latch onto the literal fox-and-grape imagery. Addressing this oversight, we present ``Mediom,'' a multilingual, multimodal idiom corpus of 3,533 Hindi, Bengali, and Thai idioms, each paired with gold-standard explanations, cross-lingual translations, and carefully aligned text--image representations. We benchmark both large language models (textual reasoning) and vision-language models (figurative disambiguation) on Mediom, exposing systematic failures in metaphor comprehension. To mitigate these gaps, we propose ``HIDE,'' a Hinting-based Idiom Explanation framework that leverages error-feedback retrieval and targeted diagnostic cues for iterative reasoning refinement. Collectively, Mediom and HIDE establish a rigorous test bed and methodology for culturally grounded, multimodal idiom understanding embedded with reasoning hints in next-generation AI systems.
Problem

Research questions and friction points this paper is trying to address.

idiomatic reasoning
metaphor comprehension
multimodal understanding
cultural grounding
non-literal meaning
Innovation

Methods, ideas, or system contributions that make the work stand out.

idiom understanding
multimodal reasoning
cross-lingual transfer
metaphor comprehension
hint-based explanation
Sarmistha Das
Sarmistha Das
Indian Institute Of Technology Patna
MLDLNLPFinTEch
S
Shreyas Guha
Department of Computer Science and Engineering, Indian Institute of Technology Patna, Patna, India
S
Suvrayan Bandyopadhyay
Department of Computer Science and Engineering, Indian Institute of Technology Patna, Patna, India
S
Salisa Phosit
School of Information Technology, King Mongkutโ€™s Institute of Technology Ladkrabang, Bangkok, Thailand
Kitsuchart Pasupa
Kitsuchart Pasupa
Professor, School of Information Technology, King Mongkut's Institute of Technology Ladkrabang
Machine LearningPattern RecognitionArtificial Intelligence
S
Sriparna Saha
Department of Computer Science and Engineering, Indian Institute of Technology Patna, Patna, India