Comparative study on noise-augmented training and its effect on adversarial robustness in ASR systems

📅 2024-09-03
🏛️ Computer Speech and Language
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study investigates whether noise-augmented training can simultaneously enhance both adversarial robustness and speech intelligibility of automatic speech recognition (ASR) systems. We conduct systematic experiments comparing multiple noise sources (white Gaussian noise, babble, music), injection strategies (time-domain vs. frequency-domain), and noise intensities, under adversarial attacks guided by projected gradient descent (PGD) and CTC loss, evaluated on Conformer and fine-tuned Whisper models. We propose the first quantitative model linking noise-augmentation strategies to adversarial robustness and introduce a “robustness–intelligibility” trade-off evaluation framework, moving beyond conventional accuracy-only metrics. On LibriSpeech and VoxCeleb, our approach achieves an average 12.7% improvement in adversarial accuracy with <1.5% word error rate (WER) degradation. Results demonstrate that frequency-domain noise injection significantly outperforms time-domain injection, revealing key mechanisms and practical boundaries for robustness enhancement.

Technology Category

Application Category

Problem

Research questions and friction points this paper is trying to address.

Investigating noise-augmented training effects on ASR adversarial robustness
Comparing robustness across four ASR architectures under different augmentations
Evaluating model resilience against white-box and black-box adversarial attacks
Innovation

Methods, ideas, or system contributions that make the work stand out.

Noise-augmented training improves ASR adversarial robustness
Comparative analysis of four ASR architectures under augmentation
Background noise and speed variations enhance attack resistance
🔎 Similar Papers
No similar papers found.