🤖 AI Summary
Bohairic Coptic—the dominant ecclesiastical language of Egypt from the late Byzantine to the pre-Mamluk period—lacks syntactically annotated resources, hindering computational and historical linguistic research. Method: We construct and release the first Universal Dependencies (UD v2.12) treebank for Bohairic Coptic, comprising over 1,200 sentences drawn from biblical, hagiographic, and ascetic texts, with linguistically grounded dependency annotations. We conduct cross-dialectal parsing experiments contrasting Bohairic with Sahidic, employing both joint and separate modeling strategies. Results: Our analysis reveals systematic syntactic divergences between the two dialects; joint modeling yields negligible gains, underscoring Bohairic’s grammatical distinctiveness and the necessity of dialect-specific parsers. This treebank fills a critical gap in structural resources for Late Coptic and establishes a new paradigm for cross-dialectal joint and transfer parsing, serving as foundational infrastructure for historical language processing and Oriental Christian textual scholarship.
📝 Abstract
Despite recent advances in digital resources for other Coptic dialects, especially Sahidic, Bohairic Coptic, the main Coptic dialect for pre-Mamluk, late Byzantine Egypt, and the contemporary language of the Coptic Church, remains critically under-resourced. This paper presents and evaluates the first syntactically annotated corpus of Bohairic Coptic, sampling data from a range of works, including Biblical text, saints' lives and Christian ascetic writing. We also explore some of the main differences we observe compared to the existing UD treebank of Sahidic Coptic, the classical dialect of the language, and conduct joint and cross-dialect parsing experiments, revealing the unique nature of Bohairic as a related, but distinct variety from the more often studied Sahidic.