🤖 AI Summary
This study addresses the challenge of accurately classifying systolic heart murmurs, which exhibit high variability in intensity, pitch, and quality. To enhance feature stability and classification performance, the authors propose a novel method that integrates a multi-resolution complex Gabor dictionary with a Vision Transformer. By enforcing consistency of projection basis functions across multiple murmur segments through a shared dictionary, the approach improves the robustness of learned representations. Multi-resolution time–frequency features are extracted via complex orthogonal matching pursuit and subsequently fused using a hybrid architecture combining convolutional neural networks and Vision Transformer. Evaluated on the CirCor DigiScope dataset, the method achieves a classification accuracy of 95.96% across four types of systolic murmurs, demonstrating significant improvements in both robustness and overall performance.
📝 Abstract
Systolic murmurs are extra heart sounds that occur during the contraction phase of the cardiac cycle, often indicating heart abnormalities caused by turbulent blood flow. Their intensity, pitch, and quality vary, requiring precise identification for the accurate diagnosis of cardiac disorders. This study presents an automatic classification system for systolic murmurs using a feature extraction module, followed by a classification model. The feature extraction module employs complex orthogonal matching pursuit to project single or multiple murmur segments onto a redundant dictionary composed of multiresolution complex Gabor basis functions (GBFs). The resulting projection weights are split and reshaped into variable-resolution time--frequency feature matrices. Processing multiple segments of a single recording using a shared dictionary mitigates murmur variability. This is achieved by learning the weights for each segment while enforcing that they correspond to the same set of basis functions in the dictionary, promoting consistent time--frequency feature matrices. The classification model is built based on a vision transformer to process multiple input matrices of different resolutions by passing each through a convolutional neural network for patch tokenization. All embedding tokens are then concatenated to form a matrix and forwarded to an encoder layer that includes multihead attention, residual connections, and a convolutional network with a kernel size of one. This integration of multiresolution feature extraction with transformer-based feature classification enhances the accuracy and reliability of heart murmur identification. An experimental analysis of four types of systolic murmurs from the CirCor DigiScope dataset demonstrates the effectiveness of the system, achieving a classification accuracy of $95.96\%$.