Paediatric-HGNN: A Hybrid Heterogeneous Graph Neural Network for Detecting Disfluency in Children's Speech via Multiscale Acoustic Fusion

📅 2026-06-06

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

This study addresses the challenge of automatically distinguishing childhood pathological stuttering from typical developmental disfluency, a task hindered by their high acoustic similarity and the substantial variability inherent in children’s speech. To overcome these limitations, the authors propose Paediatric-HGNN, a novel pediatric heterogeneous graph neural network that, for the first time, constructs a heterogeneous graph integrating lexical units with frame-level acoustic features. The model incorporates a context-aware part-in-whole interaction network (CaPIN) and employs a multi-scale fusion strategy to hierarchically model linguistic and acoustic relationships, effectively capturing “search behavior” in children’s speech. Evaluated on the UCLASS and FluencyBank datasets, the approach achieves a weighted accuracy of 82.4% and an F1-score of 0.386 for typical disfluencies, significantly outperforming conventional one-dimensional signal-based methods while enhancing interpretability and robustness.

📝 Abstract

Automated stuttering detection (ASD) systems struggle with paediatric speech due to high acoustic variability in developing voices and the subtle distinction between pathological stuttering and typical developmental disfluencies. We introduce Paediatric-HGNN, a framework using a Context-aware Part-whole Interaction Network (CaPIN) tailored for paediatric data. Instead of conventional 1D signal modelling, our approach builds a heterogeneous graph capturing hierarchical relationships between lexical units (word nodes) and fine-grained acoustic segments (frame nodes). Trained on curated paediatric corpora (UCLASS and FluencyBank), Paediatric-HGNN achieves 82.4% weighted accuracy and a Typical Disfluency F1-score of 0.386. Modelling hierarchical lexical-acoustic interactions captures developmental "searching" behaviour, offering a more robust and interpretable tool for early clinical intervention.

Problem

Research questions and friction points this paper is trying to address.

stuttering detection

paediatric speech

developmental disfluency

acoustic variability

speech disfluency

Innovation

Methods, ideas, or system contributions that make the work stand out.

Heterogeneous Graph Neural Network

Multiscale Acoustic Fusion

Context-aware Part-whole Interaction