Aladdin-FTI @ AMIYA Three Wishes for Arabic NLP: Fidelity, Diglossia, and Multidialectal Generation

📅 2026-02-18

📈 Citations: 0

✨ Influential: 0

career value

143K/year

🤖 AI Summary

This work addresses the longstanding neglect of Arabic dialects in natural language processing due to their lack of standardization and high variability. Treating Arabic as a pluricentric language, the study proposes a unified generative framework that jointly models Modern Standard Arabic (MSA) alongside five major dialects—Moroccan, Egyptian, Palestinian, Syrian, and Saudi—and enables bidirectional translation among these varieties and English. Built upon a large language model architecture, the system leverages instruction tuning and multi-task learning to balance translation fidelity, code-switching phenomena, and the ability to generate across multiple dialects. The resulting model significantly expands the coverage and practical utility of Arabic NLP, with both the model and code publicly released to support further research and applications.

Technology Category

Application Category

📝 Abstract

Arabic dialects have long been under-represented in Natural Language Processing (NLP) research due to their non-standardization and high variability, which pose challenges for computational modeling. Recent advances in the field, such as Large Language Models (LLMs), offer promising avenues to address this gap by enabling Arabic to be modeled as a pluricentric language rather than a monolithic system. This paper presents Aladdin-FTI, our submission to the AMIYA shared task. The proposed system is designed to both generate and translate dialectal Arabic (DA). Specifically, the model supports text generation in Moroccan, Egyptian, Palestinian, Syrian, and Saudi dialects, as well as bidirectional translation between these dialects, Modern Standard Arabic (MSA), and English. The code and trained model are publicly available.

Problem

Research questions and friction points this paper is trying to address.

Arabic dialects

Natural Language Processing

non-standardization

high variability

computational modeling

Innovation

Methods, ideas, or system contributions that make the work stand out.

pluricentric modeling

dialectal Arabic generation

bidirectional translation