UD-English-CHILDES: A Collected Resource of Gold and Silver Universal Dependencies Trees for Child Language Interactions

📅 2025-04-28

📈 Citations: 0

✨ Influential: 0

career value

151K/year

🤖 AI Summary

Existing CHILDES corpora suffer from annotation heterogeneity and limited scale, hindering standardized dependency parsing and cross-linguistic research in child language acquisition. Method: We propose a “gold + silver” collaborative annotation paradigm: (1) manually constructing 48k high-quality Universal Dependencies (UD) v2-compliant dependency trees across 11 CHILDES subcorpora, covering both child and caregiver utterances; and (2) automatically generating 1M silver-standard annotations via rule-based and model-based approaches. We further perform transcription alignment, noise cleaning, and cross-speaker/corpus dependency consistency verification. Contribution/Results: This work delivers the first officially released CHILDES-driven UD treebank—uniquely standardized under UD v2 for child language. It enables systematic UD adoption and large-scale expansion in this domain, significantly enhancing data support for child language parsing, acquisition modeling, and cross-lingual dependency transfer research.

Technology Category

Application Category

📝 Abstract

CHILDES is a widely used resource of transcribed child and child-directed speech. This paper introduces UD-English-CHILDES, the first officially released Universal Dependencies (UD) treebank derived from previously dependency-annotated CHILDES data with consistent and unified annotation guidelines. Our corpus harmonizes annotations from 11 children and their caregivers, totaling over 48k sentences. We validate existing gold-standard annotations under the UD v2 framework and provide an additional 1M silver-standard sentences, offering a consistent resource for computational and linguistic research.

Problem

Research questions and friction points this paper is trying to address.

Creating first UD treebank from CHILDES data

Harmonizing annotations for child-caregiver interactions

Providing gold and silver standard UD resources

Innovation

Methods, ideas, or system contributions that make the work stand out.

First UD treebank from CHILDES data

Harmonized annotations from 11 children

Validated gold and silver UD annotations

🔎 Similar Papers

No similar papers found.