When Similar Means Different: Evaluating LLMs on Arabic--Hebrew Cognates

📅 2026-06-11

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

This study addresses the challenge posed by numerous form-similar yet semantically divergent cognates, false friends, and loanwords between Arabic and Hebrew to cross-lingual semantic understanding in large language models (LLMs). The authors present SemCog Bench, the first systematically constructed benchmark comprising 1,858 Arabic–Hebrew word pairs annotated at the sentence level. They evaluate both open-source and commercial LLMs using multiple input representations—including original scripts, diacritized text, romanization, and phonetic transcription—to assess the models’ ability to identify true cognate relationships and perform semantic disambiguation. Results show that while models perform well on genuine cognates, their accuracy drops significantly on false friends and loanwords, with contextual information yielding only marginal improvements. These findings highlight a critical limitation of current LLMs: an overreliance on surface-level orthographic or phonological similarity rather than deeper semantic analysis.

📝 Abstract

Arabic and Hebrew, as closely related Semitic languages, share a substantial lexicon of true cognates, misleading false friends, and modern loanwords. This overlap poses a challenge for cross-lingual semantic understanding in large language models (LLMs). To evaluate this capability, we introduce SemCog Bench, a curated benchmark of 1,858 Arabic--Hebrew word pairs with sentence-level annotations for cognate identification and semantic disambiguation. We evaluate open-source and commercial LLMs across multiple input representations (raw, diacritized, Romanized, and phonetic) and reveal a critical gap in cross-lingual reasoning. While models achieve high accuracy on true cognates, performance drops sharply on false friends and loanwords, reflecting a strong reliance on surface-form similarity. Furthermore, sentence-level context yields only modest improvements, suggesting that contextual cues alone are insufficient to overcome misleading form-based signals. These findings reveal a fundamental limitation of current LLMs in resolving cross-lingual form--meaning conflicts and establish SemCog Bench as a rigorous benchmark for multilingual semantic reasoning. Our code and data are publicly available.

Problem

Research questions and friction points this paper is trying to address.

cross-lingual semantic understanding

cognates

false friends

Arabic–Hebrew

large language models

Innovation

Methods, ideas, or system contributions that make the work stand out.

SemCog Bench

cross-lingual semantic reasoning

false friends