🤖 AI Summary
This study identifies linguistic markers in oral narratives of Afrikaans- and isiXhosa-speaking children predictive of later literacy difficulties, supporting early intervention in multilingual contexts. Method: Leveraging a South African bilingual narrative corpus, we extracted features—including lexical diversity, mean utterance length, articulation rate, and part-of-speech distributions (particularly verbs and auxiliaries)—and applied lightweight machine learning classifiers. Contribution/Results: Lexical diversity and utterance length emerged as robust cross-linguistic developmental indicators; conversely, higher frequencies of specific verbs and auxiliaries significantly correlated with lower intervention need, revealing both language-specific and shared predictive patterns. This is the first systematic validation of narrative structure–literacy associations in major Southern African indigenous languages. The resulting interpretable, transferable computational framework enables scalable, resource-efficient multilingual early language assessment in low-resource settings.
📝 Abstract
Oral narrative skills are strong predictors of later literacy development. This study examines the features of oral narratives from children who were identified by experts as requiring intervention. Using simple machine learning methods, we analyse recorded stories from four- and five-year-old Afrikaans- and isiXhosa-speaking children. Consistent with prior research, we identify lexical diversity (unique words) and length-based features (mean utterance length) as indicators of typical development, but features like articulation rate prove less informative. Despite cross-linguistic variation in part-of-speech patterns, the use of specific verbs and auxiliaries associated with goal-directed storytelling is correlated with a reduced likelihood of requiring intervention. Our analysis of two linguistically distinct languages reveals both language-specific and shared predictors of narrative proficiency, with implications for early assessment in multilingual contexts.