A Tale of Two Scripts: Transliteration and Post-Correction for Judeo-Arabic

📅 2025-07-07
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the challenge of automatic transliteration from Judeo-Arabic—written in Hebrew script—to standard Arabic script, which is complicated by ambiguous character mappings, inconsistent orthography, and frequent code-switching with Hebrew and Aramaic. We propose the first end-to-end automated transliteration framework: a fine-grained character-level mapping module establishes baseline correspondences, followed by a context-aware post-correction module to resolve ambiguities. To rigorously evaluate performance, we introduce the first large language model–based benchmark specifically designed for Judeo-Arabic transliteration, and publicly release both the benchmark data and our models. Experiments demonstrate substantial improvements in transliteration accuracy, enabling off-the-shelf Arabic NLP tools—including part-of-speech tagging and machine translation—to process historical Judeo-Arabic texts effectively. This advancement significantly facilitates the digitization and computational linguistic analysis of Judeo-Arabic heritage documents.

Technology Category

Application Category

📝 Abstract
Judeo-Arabic refers to Arabic variants historically spoken by Jewish communities across the Arab world, primarily during the Middle Ages. Unlike standard Arabic, it is written in Hebrew script by Jewish writers and for Jewish audiences. Transliterating Judeo-Arabic into Arabic script is challenging due to ambiguous letter mappings, inconsistent orthographic conventions, and frequent code-switching into Hebrew and Aramaic. In this paper, we introduce a two-step approach to automatically transliterate Judeo-Arabic into Arabic script: simple character-level mapping followed by post-correction to address grammatical and orthographic errors. We also present the first benchmark evaluation of LLMs on this task. Finally, we show that transliteration enables Arabic NLP tools to perform morphosyntactic tagging and machine translation, which would have not been feasible on the original texts.
Problem

Research questions and friction points this paper is trying to address.

Transliterate Judeo-Arabic Hebrew script to Arabic script
Address ambiguous letter mappings and orthographic errors
Enable Arabic NLP tools for Judeo-Arabic text processing
Innovation

Methods, ideas, or system contributions that make the work stand out.

Two-step transliteration with post-correction
Benchmark evaluation using LLMs
Enables Arabic NLP tools application
🔎 Similar Papers
No similar papers found.