🤖 AI Summary
Automated classification of pediatric lung sounds in children under six years old remains challenging due to substantial acoustic differences from adult recordings—stemming from physiological development—and severe scarcity of annotated data. To address this, we propose a novel multi-stage hybrid CNN-Transformer model that synergistically leverages CNNs’ local feature extraction capability and Transformers’ long-range temporal modeling capacity for respiratory sound analysis. Using STFT-based spectrograms as input, the model jointly models full-length recordings and individual respiratory events across stages. We further introduce class-level focal loss to mitigate extreme class imbalance. On binary and multi-class classification tasks, our method achieves accuracies of 0.9039 and 0.8448, respectively—surpassing state-of-the-art methods by 3.81% and 5.94%. This advancement significantly enhances the practicality of intelligent pediatric respiratory disease screening in resource-constrained clinical settings.
📝 Abstract
Automated analysis of lung sound auscultation is essential for monitoring respiratory health, especially in regions facing a shortage of skilled healthcare workers. While respiratory sound classification has been widely studied in adults, its ap plication in pediatric populations, particularly in children aged <6 years, remains an underexplored area. The developmental changes in pediatric lungs considerably alter the acoustic proper ties of respiratory sounds, necessitating specialized classification approaches tailored to this age group. To address this, we propose a multistage hybrid CNN-Transformer framework that combines CNN-extracted features with an attention-based architecture to classify pediatric respiratory diseases using scalogram images from both full recordings and individual breath events. Our model achieved an overall score of 0.9039 in binary event classifi cation and 0.8448 in multiclass event classification by employing class-wise focal loss to address data imbalance. At the recording level, the model attained scores of 0.720 for ternary and 0.571 for multiclass classification. These scores outperform the previous best models by 3.81% and 5.94%, respectively. This approach offers a promising solution for scalable pediatric respiratory disease diagnosis, especially in resource-limited settings.