🤖 AI Summary
Masked face detection and recognition suffer from scarce real-world labeled data and domain distribution shift. To address these challenges, we propose a two-stage generative data augmentation framework: first, applying geometry-guided controllable warping to the mask region; second, leveraging unpaired GANs for high-fidelity masked-face image synthesis. We introduce a novel non-mask preservation loss and stochastic noise injection to jointly ensure generation diversity, structural consistency, and training stability. This cascaded paradigm uniquely integrates rule-based warping with unpaired image translation—marking the first such integration in this domain. Extensive experiments demonstrate significant improvements over single-stage augmentation methods: +3.2% mAP on WIDER FACE and MAFA for detection, and +4.7% accuracy for recognition. The synthesized images exhibit high visual fidelity and effectively mitigate the scarcity of occluded face samples.
📝 Abstract
Data scarcity and distribution shift pose major challenges for masked face detection and recognition. We propose a two-step generative data augmentation framework that combines rule-based mask warping with unpaired image-to-image translation using GANs, enabling the generation of realistic masked-face samples beyond purely synthetic transformations. Compared to rule-based warping alone, the proposed approach yields consistent qualitative improvements and complements existing GAN-based masked face generation methods such as IAMGAN. We introduce a non-mask preservation loss and stochastic noise injection to stabilize training and enhance sample diversity. Experimental observations highlight the effectiveness of the proposed components and suggest directions for future improvements in data-centric augmentation for face recognition tasks.