🤖 AI Summary
This work proposes a low-cost, dual-modal biometric authentication system that leverages only a standard camera and microphone, addressing the high cost and limited practicality of traditional identity verification systems reliant on specialized hardware. The approach employs a two-stage cascaded mechanism: an initial screening stage utilizes a pruned VGG-16 network integrated with MTCNN for face recognition, followed by a secondary verification stage that applies a CNN-based speaker verification model for matched identities. This design effectively balances computational efficiency and robustness. Experimental results demonstrate that the system achieves a face recognition accuracy of 95.1% and a voice verification accuracy of 98.9%, with an equal error rate (EER) as low as 3.456%, significantly enhancing both security and usability in scenarios devoid of dedicated authentication hardware.
📝 Abstract
We present a cost-effective two-step authentication system that integrates face identification and speaker verification using only a camera and microphone available on common devices. The pipeline first performs face recognition to identify a candidate user from a small enrolled group, then performs voice recognition only against the matched identity to reduce computation and improve robustness. For face recognition, a pruned VGG-16 based classifier is trained on an augmented dataset of 924 images from five subjects, with faces localized by MTCNN; it achieves 95.1% accuracy. For voice recognition, a CNN speaker-verification model trained on LibriSpeech (train-other-360) attains 98.9% accuracy and 3.456% EER on test-clean. Source code and trained models are available at https://github.com/NCUE-EE-AIAL/Two-step-Authentication-Multi-biometric-System.