🤖 AI Summary
To address the need for low-cost, high-accuracy automated heart sound diagnosis in primary healthcare settings—particularly under conditions of limited labeled data and low-quality auscultatory signals—this study tackles cardiac abnormality classification from small-scale, noisy phonocardiogram (PCG) recordings. We propose two novel deep learning architectures: (1) a multi-branch deep convolutional network (MBDCN) inspired by human auditory perception; and (2) an LSTM-CNN hybrid network (LSCN) for joint time-frequency modeling, which uniquely employs power spectral density as a 1D multi-scale convolutional input and enables end-to-end spatiotemporal–spectral representation learning. Evaluated on both public and proprietary PCG datasets, LSCN achieves a classification accuracy of 96.2%, significantly outperforming conventional handcrafted features (e.g., MFCCs, wavelet coefficients). The approach effectively overcomes feature extraction and generalization bottlenecks inherent in small-sample scenarios, delivering a deployable intelligent auscultation solution suitable for resource-constrained environments.
📝 Abstract
This paper presents a fast and cost-effective method for diagnosing cardiac abnormalities with high accuracy and reliability using low-cost systems in clinics. The primary limitation of automatic diagnosing of cardiac diseases is the rarity of correct and acceptable labeled samples, which can be expensive to prepare. To address this issue, two methods are proposed in this work. The first method is a unique Multi-Branch Deep Convolutional Neural Network (MBDCN) architecture inspired by human auditory processing, specifically designed to optimize feature extraction by employing various sizes of convolutional filters and audio signal power spectrum as input. In the second method, called as Long short-term memory-Convolutional Neural (LSCN) model, Additionally, the network architecture includes Long Short-Term Memory (LSTM) network blocks to improve feature extraction in the time domain. The innovative approach of combining multiple parallel branches consisting of the one-dimensional convolutional layers along with LSTM blocks helps in achieving superior results in audio signal processing tasks. The experimental results demonstrate superiority of the proposed methods over the state-of-the-art techniques. The overall classification accuracy of heart sounds with the LSCN network is more than 96%. The efficiency of this network is significant compared to common feature extraction methods such as Mel Frequency Cepstral Coefficients (MFCC) and wavelet transform. Therefore, the proposed method shows promising results in the automatic analysis of heart sounds and has potential applications in the diagnosis and early detection of cardiovascular diseases.