🤖 AI Summary
This study addresses the clinical challenge of distinguishing pathological subharmonic phonation—such as low-frequency periodic perturbations induced by vocal fold lesions—from normal voice. We propose an end-to-end automatic detection method based on a fully convolutional neural network (FCN), the first to apply FCNs to subharmonic period analysis. Unlike conventional models relying on sequential modeling (e.g., RNNs or attention mechanisms), our approach leverages global receptive fields to directly model glottal cycle variability across entire utterances, enabling implicit learning of time–frequency features associated with subharmonic-induced periodic disturbances. Augmented with synthetic speech data, the model achieves 98.2% classification accuracy on synthetic datasets and demonstrates robust performance on real sustained vowel recordings. Its core innovation lies in replacing recurrent or attention-based architectures with a pure convolutional design, achieving high accuracy and strong generalization for subharmonic perception—indicating significant potential for clinical deployment.
📝 Abstract
Many voice disorders induce subharmonic phonation, but voice signal analysis is currently lacking a technique to detect the presence of subharmonics reliably. Distinguishing subharmonic phonation from normal phonation is a challenging task as both are nearly periodic phenomena. Subharmonic phonation adds cyclical variations to the normal glottal cycles. Hence, the estimation of subharmonic period requires a wholistic analysis of the signals. Deep learning is an effective solution to this type of complex problem. This paper describes fully convolutional neural networks which are trained with synthesized subharmonic voice signals to classify the subharmonic periods. Synthetic evaluation shows over 98% classification accuracy, and assessment of sustained vowel recordings demonstrates encouraging outcomes as well as the areas for future improvements.