MADPromptS: Unlocking Zero-Shot Morphing Attack Detection with Multiple Prompt Aggregation

📅 2025-08-12
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This paper addresses the zero-shot detection challenge in facial morphing attack detection (MAD), proposing a purely zero-shot method that requires no fine-tuning and no access to attack samples during training. The core idea leverages prior knowledge encoded in multimodal foundation models (e.g., CLIP) by designing and aggregating a set of semantically complementary text prompts—termed prompt ensemble—to sharpen the discriminative boundary between bona fide faces and diverse morphing attacks in the image–text embedding space. To our knowledge, this is the first work to systematically introduce pure zero-shot learning into MAD, eliminating reliance on attack-type priors or model adaptation. Extensive experiments across multiple public benchmarks demonstrate that the proposed method significantly outperforms existing zero-shot baselines, exhibiting strong generalization and robustness under cross-attack-type and cross-dataset settings.

Technology Category

Application Category

📝 Abstract
Face Morphing Attack Detection (MAD) is a critical challenge in face recognition security, where attackers can fool systems by interpolating the identity information of two or more individuals into a single face image, resulting in samples that can be verified as belonging to multiple identities by face recognition systems. While multimodal foundation models (FMs) like CLIP offer strong zero-shot capabilities by jointly modeling images and text, most prior works on FMs for biometric recognition have relied on fine-tuning for specific downstream tasks, neglecting their potential for direct, generalizable deployment. This work explores a pure zero-shot approach to MAD by leveraging CLIP without any additional training or fine-tuning, focusing instead on the design and aggregation of multiple textual prompts per class. By aggregating the embeddings of diverse prompts, we better align the model's internal representations with the MAD task, capturing richer and more varied cues indicative of bona-fide or attack samples. Our results show that prompt aggregation substantially improves zero-shot detection performance, demonstrating the effectiveness of exploiting foundation models' built-in multimodal knowledge through efficient prompt engineering.
Problem

Research questions and friction points this paper is trying to address.

Detect face morphing attacks without training data
Leverage CLIP model for zero-shot morphing detection
Improve detection via multi-prompt aggregation strategy
Innovation

Methods, ideas, or system contributions that make the work stand out.

Leverages CLIP without fine-tuning
Aggregates multiple textual prompts
Aligns model representations with MAD