C2GA: A Class-Controllable Generative Augmentation Framework for Respiratory Sound Classification

📅 2026-06-01

📈 Citations: 0

✨ Influential: 0

career value

172K/year

🤖 AI Summary

This work addresses the limitations in respiratory sound classification caused by small-scale real-world datasets, high noise levels, and class imbalance. Existing augmentation techniques often distort pathological characteristics, while generative models struggle to simultaneously ensure high fidelity and class controllability. To overcome these challenges, the authors propose the C2GA framework, which employs a conditional vector-quantized variational autoencoder to construct a discrete latent space that disentangles local acoustic features from global class prototypes. A Transformer-based autoregressive prior then generates class-consistent token sequences, which are combined with the corresponding prototypes to reconstruct high-fidelity Mel-spectrograms for data augmentation. This approach achieves, for the first time, semantically controllable generation of high-quality respiratory sounds, effectively preserving pathological features under data scarcity, enhancing class consistency, and significantly improving the robustness and generalization of downstream classifiers—demonstrating its clinical applicability.

📝 Abstract

Background: Respiratory sound classification plays a critical role in the clinical identification of pulmonary pathologies. However, its performance is often hindered by the limited size, severe noise, and class imbalance of real-world auscultation datasets. Although conventional audio augmentation techniques are easy to implement, they may inadvertently distort subtle pathological characteristics. Meanwhile, existing Variational Autoencoder (VAE)- or Generative Adversarial Network (GAN)-based generative approaches often suffer from limited sample fidelity and insufficient controllability over class semantics, particularly under conditions of scarce supervision. Methods: To overcome these limitations, we propose C2GA, a class-controllable generative augmentation framework. C2GA first constructs a semantically rich discrete latent space using a conditional Vector-Quantized Variational Autoencoder (VQ-VAE), in which local acoustic tokens are explicitly decoupled from global class prototypes. Subsequently, a Transformer-based autoregressive prior is trained to generate label-consistent token sequences. These generated tokens are then fused with the corresponding class prototypes and decoded into high-fidelity Mel-spectrograms for data augmentation. Conclusion: These results indicate that C2GA provides an effective and semantically reliable augmentation strategy for respiratory sound analysis. By enabling controllable and high-quality data generation, the proposed framework offers a promising solution for improving the robustness and generalization of respiratory sound classification in realistic clinical scenarios.

Problem

Research questions and friction points this paper is trying to address.

respiratory sound classification

class imbalance

data augmentation

generative models

limited supervision

Innovation

Methods, ideas, or system contributions that make the work stand out.

class-controllable generation

VQ-VAE

Transformer-based autoregressive prior