Physics-Guided Deepfake Detection for Voice Authentication Systems

📅 2025-12-04
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Edge-based speaker authentication systems face dual threats from deepfake audio attacks targeting the data plane and model poisoning attacks compromising the control plane in federated learning. Method: We propose a physics-guided and uncertainty-aware joint defense framework that integrates vocal tract dynamics modeling with self-supervised multimodal representation learning, implemented as a Bayesian deep learning detection architecture for simultaneous mitigation of data-plane deepfakes and control-plane model poisoning. Contribution/Results: This work pioneers the tight integration of interpretable acoustic physical priors with Bayesian uncertainty estimation, significantly enhancing robustness and explainability against novel adversarial attacks. Experiments demonstrate high detection accuracy (>98.2%) and strong poisoning resistance—reducing poisoning success rate to <3.1%—under complex adversarial conditions. To our knowledge, this is the first multimodal solution for edge speaker authentication that jointly ensures physical interpretability, statistical reliability, and distributed security.

Technology Category

Application Category

📝 Abstract
Voice authentication systems deployed at the network edge face dual threats: a) sophisticated deepfake synthesis attacks and b) control-plane poisoning in distributed federated learning protocols. We present a framework coupling physics-guided deepfake detection with uncertainty-aware in edge learning. The framework fuses interpretable physics features modeling vocal tract dynamics with representations coming from a self-supervised learning module. The representations are then processed via a Multi-Modal Ensemble Architecture, followed by a Bayesian ensemble providing uncertainty estimates. Incorporating physics-based characteristics evaluations and uncertainty estimates of audio samples allows our proposed framework to remain robust to both advanced deepfake attacks and sophisticated control-plane poisoning, addressing the complete threat model for networked voice authentication.
Problem

Research questions and friction points this paper is trying to address.

Detects deepfake synthesis attacks in voice authentication
Prevents control-plane poisoning in federated learning protocols
Enhances robustness with physics-guided and uncertainty-aware methods
Innovation

Methods, ideas, or system contributions that make the work stand out.

Physics-guided deepfake detection with vocal tract modeling
Multi-modal ensemble architecture for feature fusion
Bayesian ensemble for uncertainty-aware edge learning
🔎 Similar Papers