QuranMorph: Morphologically Annotated Quranic Corpus

📅 2025-06-22
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the scarcity of high-quality morphologically annotated corpora for the Arabic Qur’an, this study constructs the first manually curated, fine-grained morphological corpus—annotated by three linguistics experts—comprising 77,429 word tokens. Morphological analysis leverages the Qabas lexicon for precise lemmatization and adopts the SAMA/Qabas framework to annotate 40 fine-grained part-of-speech categories, all rigorously validated through expert adjudication. The key contribution lies in enabling structured interoperability with over 100 Arabic lexical resources and corpora—including Qabas—thereby substantially improving morphological parsing accuracy and resource reusability for religious texts. This open-source corpus has been integrated into the SinaLab platform, establishing a new authoritative benchmark for Arabic computational linguistics and Qur’anic text research.

Technology Category

Application Category

📝 Abstract
We present the QuranMorph corpus, a morphologically annotated corpus for the Quran (77,429 tokens). Each token in the QuranMorph was manually lemmatized and tagged with its part-of-speech by three expert linguists. The lemmatization process utilized lemmas from Qabas, an Arabic lexicographic database linked with 110 lexicons and corpora of 2 million tokens. The part-of-speech tagging was performed using the fine-grained SAMA/Qabas tagset, which encompasses 40 tags. As shown in this paper, this rich lemmatization and POS tagset enabled the QuranMorph corpus to be inter-linked with many linguistic resources. The corpus is open-source and publicly available as part of the SinaLab resources at (https://sina.birzeit.edu/quran)
Problem

Research questions and friction points this paper is trying to address.

Create a morphologically annotated Quranic corpus
Manually lemmatize and tag Quranic tokens
Inter-link corpus with linguistic resources
Innovation

Methods, ideas, or system contributions that make the work stand out.

Manual lemmatization by expert linguists
Utilized Qabas Arabic lexicographic database
Fine-grained SAMA/Qabas POS tagging
🔎 Similar Papers
No similar papers found.