Optimizing 2D Input Representations and Sub-phase Fusion Strategies for Differential Diagnosis of Asthma and COPD Using CNN- and GRU-Based Networks

📅 2026-06-09

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

This study addresses the challenge of inconsistent time-frequency representation dimensions in lung sound signals caused by variable respiratory cycle lengths. To resolve this, the authors propose an adaptive-length windowing strategy that standardizes the spatiotemporal dimensions of both MFCCs and log-Mel spectrograms. Building upon this unified representation, they employ CNNs to extract sub-phase features and systematically evaluate fusion approaches, including direct concatenation, GRU, and GRU with attention mechanisms. Experimental results demonstrate that the MFCC-based model achieves the best F1 scores of 0.877 and 0.855 at the respiratory-cycle and subject levels, respectively, significantly outperforming models based on log-Mel spectrograms and VAR. The findings also underscore the critical importance of real-world data, while revealing that more complex fusion strategies and data augmentation techniques such as mixup do not yield further performance gains.

📝 Abstract

This study aims to explore the performance of the VAR model in comparison with mel-frequency cepstral coefficient (MFCC) matrices and log-mel spectrograms using deep learning. In pulmonary sound classification, spectrogram-based representations suffer from inconsistent temporal dimensions due to varying respiratory cycle durations. Along with traditional trimming/zero-padding, adaptive-length windowing was presented to fix their temporal dimensions. Their spectral and temporal dimensions were optimized by testing a range of parameters. Different convolutional neural network (CNN) architectures were employed to extract features from the two-dimensional representations obtained over the sub-phases. The extracted sub-phase features were then fused using various strategies including direct concatenation, gated recurrent unit (GRU) network and GRU with attention mechanism. Model performances were assessed through respiratory cycle-based evaluation and subject-based evaluation comprising multiple respiratory cycles. Several data augmentation techniques were also studied to cope with limitations in data size. The best cycle-based F1-score (0.877) was obtained using the MFCC matrices with thirteen coefficients and 64-point time resolution per sub-phase representation followed by direct feature concatenation, and the best subject-based F1-score (0.855) was obtained using the MFCC matrices with thirteen coefficients and 256-point time resolution per full-cycle representation, both obtained by adaptive-length windowing. Augmentation degraded the performance of models overall, yet mixup augmentation was the best among the methods tested. MFCC outperformed log-mel spectrogram and VAR model in differentiation of asthma and COPD. Sophisticated fusion strategies did not improve the diagnosis. Augmentation did not contribute, demonstrating the significance of authentic data in pulmonary sound studies.

Problem

Research questions and friction points this paper is trying to address.

asthma-COPD differential diagnosis

pulmonary sound classification

2D input representation

temporal dimension inconsistency

sub-phase fusion

Innovation

Methods, ideas, or system contributions that make the work stand out.

adaptive-length windowing

MFCC optimization

sub-phase feature fusion