Spectral Masking and Interpolation Attack (SMIA): A Black-box Adversarial Attack against Voice Authentication and Anti-Spoofing Systems

📅 2025-09-09

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

This work exposes the severe vulnerability of speaker verification and anti-spoofing systems to AI-generated speech under black-box settings, particularly to strategic manipulation attacks targeting inaudible frequency bands (SMIA). To this end, we propose a psychoacoustics-inspired black-box adversarial attack method that crafts imperceptible perturbations in high-frequency regions—beyond human auditory perception—via spectral masking and frequency-domain interpolation, enabling efficient gradient-free optimization. Our approach ensures strong imperceptibility, cross-model transferability, and robustness under realistic deployment conditions. Extensive experiments demonstrate that the attack achieves an 82% success rate against joint speaker verification–anti-spoofing systems, 97.5% against speaker verification alone, and 100% evasion against anti-spoofing modules across multiple state-of-the-art models and realistic scenario simulations—revealing fundamental limitations of current defenses in dynamic adversarial environments.

Technology Category

Application Category

📝 Abstract

Voice Authentication Systems (VAS) use unique vocal characteristics for verification. They are increasingly integrated into high-security sectors such as banking and healthcare. Despite their improvements using deep learning, they face severe vulnerabilities from sophisticated threats like deepfakes and adversarial attacks. The emergence of realistic voice cloning complicates detection, as systems struggle to distinguish authentic from synthetic audio. While anti-spoofing countermeasures (CMs) exist to mitigate these risks, many rely on static detection models that can be bypassed by novel adversarial methods, leaving a critical security gap. To demonstrate this vulnerability, we propose the Spectral Masking and Interpolation Attack (SMIA), a novel method that strategically manipulates inaudible frequency regions of AI-generated audio. By altering the voice in imperceptible zones to the human ear, SMIA creates adversarial samples that sound authentic while deceiving CMs. We conducted a comprehensive evaluation of our attack against state-of-the-art (SOTA) models across multiple tasks, under simulated real-world conditions. SMIA achieved a strong attack success rate (ASR) of at least 82% against combined VAS/CM systems, at least 97.5% against standalone speaker verification systems, and 100% against countermeasures. These findings conclusively demonstrate that current security postures are insufficient against adaptive adversarial attacks. This work highlights the urgent need for a paradigm shift toward next-generation defenses that employ dynamic, context-aware frameworks capable of evolving with the threat landscape.

Problem

Research questions and friction points this paper is trying to address.

Attacking voice authentication systems with adversarial audio

Bypassing anti-spoofing countermeasures using spectral manipulation

Demonstrating vulnerabilities in current voice security systems

Innovation

Methods, ideas, or system contributions that make the work stand out.

Spectral masking and interpolation for adversarial audio

Manipulates inaudible frequency regions in AI-generated audio

Creates imperceptible adversarial samples deceiving security systems

🔎 Similar Papers

No similar papers found.

Authors to Follow