Commonsense Reasoning in Arab Culture

📅 2025-02-18
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing commonsense reasoning evaluations for Arabic large language models (LLMs) rely heavily on machine-translated English datasets, introducing cultural superficiality and English-centric bias, thus failing to capture the rich regional diversity of the Arab world. Method: We introduce ArabCR, the first natively constructed, cross-regional commonsense reasoning benchmark for Modern Standard Arabic. It spans 13 Arab countries, 12 life domains, and 54 fine-grained subtopics, authored and cross-culturally validated by native speakers—entirely without machine translation. Contribution/Results: ArabCR is the first benchmark to achieve culturally native construction and systematic representation of regional diversity. Zero-shot evaluation reveals that current open-source Arabic LLMs (≤32B parameters) exhibit uniformly low cross-regional commonsense reasoning accuracy and severe cultural adaptation deficits. ArabCR establishes a foundational benchmark and methodological paradigm for culture-aware evaluation of Arabic AI systems.

Technology Category

Application Category

📝 Abstract
Despite progress in Arabic large language models, such as Jais and AceGPT, their evaluation on commonsense reasoning has largely relied on machine-translated datasets, which lack cultural depth and may introduce Anglocentric biases. Commonsense reasoning is shaped by geographical and cultural contexts, and existing English datasets fail to capture the diversity of the Arab world. To address this, we introduce datasetname, a commonsense reasoning dataset in Modern Standard Arabic (MSA), covering cultures of 13 countries across the Gulf, Levant, North Africa, and the Nile Valley. The dataset was built from scratch by engaging native speakers to write and validate culturally relevant questions for their respective countries. datasetname spans 12 daily life domains with 54 fine-grained subtopics, reflecting various aspects of social norms, traditions, and everyday experiences. Zero-shot evaluations show that open-weight language models with up to 32B parameters struggle to comprehend diverse Arab cultures, with performance varying across regions. These findings highlight the need for more culturally aware models and datasets tailored to the Arabic-speaking world.
Problem

Research questions and friction points this paper is trying to address.

Addressing Anglocentric biases in Arabic commonsense reasoning datasets
Creating culturally relevant Arabic commonsense reasoning dataset
Evaluating language models' comprehension of diverse Arab cultures
Innovation

Methods, ideas, or system contributions that make the work stand out.

Native speaker-created Arabic commonsense dataset
Covers 13 Arab countries' cultural nuances
Zero-shot evaluation reveals cultural comprehension gaps
🔎 Similar Papers
No similar papers found.
A
A. Sadallah
Department of Natural Language Processing, MBZUAI
J
Junior Cedric Tonga
Department of Natural Language Processing, MBZUAI
K
Khalid Almubarak
SDAIA
S
Saeed Almheiri
Department of Natural Language Processing, MBZUAI
F
Farah Atif
Department of Natural Language Processing, MBZUAI
Chatrine Qwaider
Chatrine Qwaider
Researcher
Natural language processingComputational linguisticsArtificial IntelligenceData mining
Karima Kadaoui
Karima Kadaoui
PhD Student in NLP, Mohamed Bin Zayed University of Artificial Intelligence
S
Sara Shatnawi
Al-Balqa Applied University
Y
Yaser Alesh
Khalifa University
Fajri Koto
Fajri Koto
Assistant Professor (tenure-track), MBZUAI
Computational LinguisticsNatural Language ProcessingMultilingual NLPHuman-centered NLP