Arabizi vs LLMs: Can the Genie Understand the Language of Aladdin?

📅 2025-02-28

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

This work addresses the challenge of machine translating Arabizi—a nonstandard, Latin-script transliteration of Arabic dialects widely used in social media—into Modern Standard Arabic (MSA) and English. We present the first systematic evaluation of mainstream large language models (LLMs) on this task across under-studied Arabic dialects. Our methodology introduces a multidimensional evaluation framework integrating human assessment with automated metrics (BLEU/chrF), employing both zero-shot and few-shot prompting strategies. We identify cultural metaphor, phonemic mapping ambiguity, and orthographic irregularity as key bottlenecks hindering model comprehension. Results show superior LLM performance on Gulf and Levantine dialects, higher translation quality into English than into MSA, and consistent degradation due to Arabizi’s cultural embeddedness. This study fills a critical gap in systematic, cross-dialect, bilingual Arabizi translation evaluation and provides methodological insights and empirical benchmarks for low-resource dialectal NLP.

Technology Category

Application Category

📝 Abstract

In this era of rapid technological advancements, communication continues to evolve as new linguistic phenomena emerge. Among these is Arabizi, a hybrid form of Arabic that incorporates Latin characters and numbers to represent the spoken dialects of Arab communities. Arabizi is widely used on social media and allows people to communicate in an informal and dynamic way, but it poses significant challenges for machine translation due to its lack of formal structure and deeply embedded cultural nuances. This case study arises from a growing need to translate Arabizi for gisting purposes. It evaluates the capacity of different LLMs to decode and translate Arabizi, focusing on multiple Arabic dialects that have rarely been studied up until now. Using a combination of human evaluators and automatic metrics, this research project investigates the model's performance in translating Arabizi into both Modern Standard Arabic and English. Key questions explored include which dialects are translated most effectively and whether translations into English surpass those into Arabic.

Problem

Research questions and friction points this paper is trying to address.

Evaluates LLMs' ability to decode and translate Arabizi

Focuses on multiple Arabic dialects rarely studied before

Investigates translation performance into Modern Standard Arabic and English

Innovation

Methods, ideas, or system contributions that make the work stand out.

Evaluates LLMs for Arabizi translation

Focuses on multiple Arabic dialects

Combines human and automatic evaluation metrics

🔎 Similar Papers

No similar papers found.

Authors to Follow