π€ AI Summary
This study addresses the predominant focus on comprehension in current language model evaluations, which overlooks the facilitative role of child-directed speech (CDS) in language production. Drawing on usage-based theories of language acquisition and inspired by construction grammar, the work proposes a generative evaluation framework that introduces, for the first time, a construction-based fill-in-the-blank task. It systematically compares Llama-architecture models trained on CDS, BabyLM, and FineWeb-edu corpora in both comprehension and production capabilities. Results reveal that models trained on CDS rapidly generate grammatically appropriate completions and concentrate probability mass on lexically suitable words, whereas models trained on web-scale data, despite superior performance on comprehension tasks, exhibit markedly delayed production abilities. These findings challenge comprehension-centric evaluation paradigms and highlight the unique contribution of CDS to the development of grammatical generation skills.
π Abstract
Recent studies suggest that child-directed speech is not conducive to language learning in BabyLMs. However, current evaluations focus predominantly on comprehension and not production, which is central to usage-based theories of language acquisition which argue how CDS facilitates early language use through constructional ''frames'' (frequent lexical patterns with open slots). We introduce a novel generation-based evaluation inspired by such theories in form of a frame-completion task, and compare Llama models trained with CDS, the BabyLM corpus, and web-crawl data (FineWeb-edu) on comprehension benchmarks and our novel framework. Our results reveal a clear dissociation between models' comprehension and production capabilities: while FineWeb-trained models excel at minimal pairs, CDS-trained models produce grammatical completions substantially earlier in training and concentrate probability mass on appropriate slot-fillers. These findings show that comprehension benchmarks underestimate what CDS affords to BabyLMs.