C2GA: A Class-Controllable Generative Augmentation Framework for Respiratory Sound Classification

📅 2026-06-01
📈 Citations: 0
Influential: 0
📄 PDF

career value

172K/year
🤖 AI Summary
This work addresses the limitations in respiratory sound classification caused by small-scale real-world datasets, high noise levels, and class imbalance. Existing augmentation techniques often distort pathological characteristics, while generative models struggle to simultaneously ensure high fidelity and class controllability. To overcome these challenges, the authors propose the C2GA framework, which employs a conditional vector-quantized variational autoencoder to construct a discrete latent space that disentangles local acoustic features from global class prototypes. A Transformer-based autoregressive prior then generates class-consistent token sequences, which are combined with the corresponding prototypes to reconstruct high-fidelity Mel-spectrograms for data augmentation. This approach achieves, for the first time, semantically controllable generation of high-quality respiratory sounds, effectively preserving pathological features under data scarcity, enhancing class consistency, and significantly improving the robustness and generalization of downstream classifiers—demonstrating its clinical applicability.
📝 Abstract
Background: Respiratory sound classification plays a critical role in the clinical identification of pulmonary pathologies. However, its performance is often hindered by the limited size, severe noise, and class imbalance of real-world auscultation datasets. Although conventional audio augmentation techniques are easy to implement, they may inadvertently distort subtle pathological characteristics. Meanwhile, existing Variational Autoencoder (VAE)- or Generative Adversarial Network (GAN)-based generative approaches often suffer from limited sample fidelity and insufficient controllability over class semantics, particularly under conditions of scarce supervision. Methods: To overcome these limitations, we propose C2GA, a class-controllable generative augmentation framework. C2GA first constructs a semantically rich discrete latent space using a conditional Vector-Quantized Variational Autoencoder (VQ-VAE), in which local acoustic tokens are explicitly decoupled from global class prototypes. Subsequently, a Transformer-based autoregressive prior is trained to generate label-consistent token sequences. These generated tokens are then fused with the corresponding class prototypes and decoded into high-fidelity Mel-spectrograms for data augmentation. Conclusion: These results indicate that C2GA provides an effective and semantically reliable augmentation strategy for respiratory sound analysis. By enabling controllable and high-quality data generation, the proposed framework offers a promising solution for improving the robustness and generalization of respiratory sound classification in realistic clinical scenarios.
Problem

Research questions and friction points this paper is trying to address.

respiratory sound classification
class imbalance
data augmentation
generative models
limited supervision
Innovation

Methods, ideas, or system contributions that make the work stand out.

class-controllable generation
VQ-VAE
Transformer-based autoregressive prior
semantic disentanglement
respiratory sound augmentation
🔎 Similar Papers
No similar papers found.
Z
Ziqi Ma
School of Computer Engineering and Science, Shanghai University, Shanghai, 200444, China
M
Mengyu Han
School of Computer Engineering and Science, Shanghai University, Shanghai, 200444, China
A
Anteng Cai
School of AI and Advanced Computing (AIAC), XJTLU Entrepreneur College (Taicang), Xi’an Jiaotong-Liverpool University, Taicang, Suzhou, 215400, China
Z
Zhanchong Liu
School of AI and Advanced Computing (AIAC), XJTLU Entrepreneur College (Taicang), Xi’an Jiaotong-Liverpool University, Taicang, Suzhou, 215400, China
B
Bowen Feng
School of AI and Advanced Computing (AIAC), XJTLU Entrepreneur College (Taicang), Xi’an Jiaotong-Liverpool University, Taicang, Suzhou, 215400, China
Hang Yu
Hang Yu
Shanghai University
Graph LearningStreaming LearningNLPComputer VisionMulti-agent
Sheng Hu
Sheng Hu
Postdoc Research Associate, Chemical Engineering, University of Tennessee, Knoxville
CatalysisLaser ablation in liquidEnergy conversion and storageOrganic photovoltaicsPolymer synthesis and processing