🤖 AI Summary
Scarce annotated resources and high entry barriers hinder pedagogical applications of historical linguistics, particularly for Old English (OE) dependency parsing. Method: We construct the first pedagogically oriented OE Universal Dependencies (UD) treebank—UD_Cairo—via a novel paradigm integrating LLM-assisted generation (leveraging prompt engineering and retrieval from authentic OE corpora) with novice collaborative annotation. Twenty representative sentences were annotated by students and verified by experts; inter-annotator agreement was measured to ensure reliability, and cross-temporal dependency parsing transfer experiments were conducted. Contributions/Results: (1) Post-editing effectively corrects systematic grammatical biases in LLM-generated OE annotations; (2) novice collaboration yields high-quality UD annotations, delivering dual pedagogical and data curation value; (3) lexical and semantic features (e.g., lemmatization, UPOS, and FEATS) significantly improve the transfer performance of modern English–trained parsers on OE.
📝 Abstract
In this paper we present a sample treebank for Old English based on the UD Cairo sentences, collected and annotated as part of a classroom curriculum in Historical Linguistics. To collect the data, a sample of 20 sentences illustrating a range of syntactic constructions in the world's languages, we employ a combination of LLM prompting and searches in authentic Old English data. For annotation we assigned sentences to multiple students with limited prior exposure to UD, whose annotations we compare and adjudicate. Our results suggest that while current LLM outputs in Old English do not reflect authentic syntax, this can be mitigated by post-editing, and that although beginner annotators do not possess enough background to complete the task perfectly, taken together they can produce good results and learn from the experience. We also conduct preliminary parsing experiments using Modern English training data, and find that although performance on Old English is poor, parsing on annotated features (lemma, hyperlemma, gloss) leads to improved performance.