A UD Treebank for Bohairic Coptic

📅 2025-04-25
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Bohairic Coptic—the dominant ecclesiastical language of Egypt from the late Byzantine to the pre-Mamluk period—lacks syntactically annotated resources, hindering computational and historical linguistic research. Method: We construct and release the first Universal Dependencies (UD v2.12) treebank for Bohairic Coptic, comprising over 1,200 sentences drawn from biblical, hagiographic, and ascetic texts, with linguistically grounded dependency annotations. We conduct cross-dialectal parsing experiments contrasting Bohairic with Sahidic, employing both joint and separate modeling strategies. Results: Our analysis reveals systematic syntactic divergences between the two dialects; joint modeling yields negligible gains, underscoring Bohairic’s grammatical distinctiveness and the necessity of dialect-specific parsers. This treebank fills a critical gap in structural resources for Late Coptic and establishes a new paradigm for cross-dialectal joint and transfer parsing, serving as foundational infrastructure for historical language processing and Oriental Christian textual scholarship.

Technology Category

Application Category

📝 Abstract
Despite recent advances in digital resources for other Coptic dialects, especially Sahidic, Bohairic Coptic, the main Coptic dialect for pre-Mamluk, late Byzantine Egypt, and the contemporary language of the Coptic Church, remains critically under-resourced. This paper presents and evaluates the first syntactically annotated corpus of Bohairic Coptic, sampling data from a range of works, including Biblical text, saints' lives and Christian ascetic writing. We also explore some of the main differences we observe compared to the existing UD treebank of Sahidic Coptic, the classical dialect of the language, and conduct joint and cross-dialect parsing experiments, revealing the unique nature of Bohairic as a related, but distinct variety from the more often studied Sahidic.
Problem

Research questions and friction points this paper is trying to address.

Lack of syntactic resources for Bohairic Coptic
First annotated corpus for Bohairic dialect
Comparison with Sahidic Coptic treebank
Innovation

Methods, ideas, or system contributions that make the work stand out.

First syntactic corpus for Bohairic Coptic
Cross-dialect parsing with Sahidic Coptic
Analyzes Bohairic's distinct linguistic features
🔎 Similar Papers
No similar papers found.
Amir Zeldes
Amir Zeldes
Associate Professor of Computational Linguistics, Georgetown University
corpus linguisticscomputational linguisticsNLPdiscoursedigital humanities
N
Nina Speransky
Hebrew University of Jerusalem
N
Nicholas Wagner
Duke University
C
Caroline T. Schroeder
University of Oklahoma