TalkTag: Fine-Grained Morphosyntactic Error Annotation for Transcribed Speech

📅 2026-06-01

📈 Citations: 0

✨ Influential: 0

career value

151K/year

🤖 AI Summary

Fine-grained morphosyntactic error annotation is crucial for clinical and developmental language research, yet traditional approaches rely heavily on expert linguists, are time-consuming, and lack scalability. To address this challenge, this work proposes TalkTag, a lightweight tool that leverages fine-tuned large language models to automatically annotate morphosyntactic errors in spoken-language transcripts formatted in CHAT, even under extremely low-resource conditions. Trained on child narrative corpora and aligned with the CHAT annotation framework, TalkTag effectively handles the ambiguities inherent in naturalistic language data. Experimental results demonstrate that the method achieves high accuracy in low-data scenarios and reliably resolves complex ambiguous cases, offering a scalable and practical automated solution for linguistic research.

📝 Abstract

Fine-grained morphosyntactic error annotation is important in clinical and developmental language research, yet it is labour-intensive, expert-dependent, and difficult to scale. We present TalkTag, an LLM-based lightweight tool fine-tuned to automate CHAT-style error annotation in spoken-language transcripts. Developed under conditions of extreme data scarcity using children's narrative data, the system shows the feasibility of linguistic analysis in low-resource settings. Our evaluation demonstrates that TalkTag produces encouragingly precise annotation while effectively identifying instances where linguistic ambiguity makes automated tagging genuinely complex. In summary, with TalkTag, we provide a scalable alternative to manual error annotation and practically viable support for morphosyntactic error annotation.

Problem

Research questions and friction points this paper is trying to address.

morphosyntactic error annotation

transcribed speech

fine-grained annotation

language research

scalability

Innovation

Methods, ideas, or system contributions that make the work stand out.

morphosyntactic error annotation

large language model

low-resource NLP