🤖 AI Summary
This work proposes a lightweight face anti-spoofing method that relies solely on a single RGB frame to defend against presentation attacks such as print, replay, and mask spoofing. Built upon the MobileNetV3 architecture, the approach integrates a content-adaptive spatial operator—involution—that generates position-specific, channel-shared convolutional kernels conditioned on the input, thereby significantly enhancing sensitivity to local forgery cues with negligible computational overhead. The model is trained end-to-end using binary cross-entropy loss, with careful optimization of the involution operator’s placement and grouping strategy to balance accuracy and efficiency. It achieves near-perfect performance on multiple benchmark datasets—including Replay-Attack, OULU-NPU, and ROSE-Youtu—with accuracies and AUCs approaching 100% and HTER as low as 0.00%. On the large-scale SiW-Mv2 Protocol-1, it attains 95.45% accuracy, 3.11% HTER, and 3.13% EER.
📝 Abstract
Face presentation attack detection (FacePAD) is critical for securing facial authentication against print, replay, and mask-based spoofing. This paper proposes CASO-PAD, an RGB-only, single-frame model that enhances MobileNetV3 with content-adaptive spatial operators (involution) to better capture localized spoof cues. Unlike spatially shared convolution kernels, the proposed operator generates location-specific, channel-shared kernels conditioned on the input, improving spatial selectivity with minimal overhead. CASO-PAD remains lightweight (3.6M parameters; 0.64 GFLOPs at $256\times256$) and is trained end-to-end using a standard binary cross-entropy objective. Extensive experiments on Replay-Attack, Replay-Mobile, ROSE-Youtu, and OULU-NPU demonstrate strong performance, achieving 100/100/98.9/99.7\% test accuracy, AUC of 1.00/1.00/0.9995/0.9999, and HTER of 0.00/0.00/0.82/0.44\%, respectively. On the large-scale SiW-Mv2 Protocol-1 benchmark, CASO-PAD further attains 95.45\% accuracy with 3.11\% HTER and 3.13\% EER, indicating improved robustness under diverse real-world attacks. Ablation studies show that placing the adaptive operator near the network head and using moderate group sharing yields the best accuracy--efficiency balance. Overall, CASO-PAD provides a practical pathway for robust, on-device FacePAD with mobile-class compute and without auxiliary sensors or temporal stacks.