Improving the Perturbation-Based Explanation of Deepfake Detectors Through the Use of Adversarially-Generated Samples

📅 2025-02-06

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

To address the semantic inconsistency of perturbed samples and the significant misalignment between attribution regions and actual manipulation regions in deepfake detection model explanations, this work introduces adversarial generation into the perturbation-based explanation framework for the first time. Specifically, we employ Natural Evolution Strategies (NES) to generate semantically preserved “fake-to-real” adversarial examples near the decision boundary, thereby constructing more accurate perturbation masks to replace conventional random masking. This strategy substantially improves the attribution fidelity of LIME, SHAP, SOBOL, and RISE when explaining state-of-the-art deepfake detectors. Quantitative evaluation on FaceForensics++ demonstrates consistent gains across all metrics; qualitative analysis confirms that resulting saliency maps exhibit tighter localization on genuine manipulation regions, enhancing both credibility and interpretability.

Technology Category

Application Category

📝 Abstract

In this paper, we introduce the idea of using adversarially-generated samples of the input images that were classified as deepfakes by a detector, to form perturbation masks for inferring the importance of different input features and produce visual explanations. We generate these samples based on Natural Evolution Strategies, aiming to flip the original deepfake detector's decision and classify these samples as real. We apply this idea to four perturbation-based explanation methods (LIME, SHAP, SOBOL and RISE) and evaluate the performance of the resulting modified methods using a SOTA deepfake detection model, a benchmarking dataset (FaceForensics++) and a corresponding explanation evaluation framework. Our quantitative assessments document the mostly positive contribution of the proposed perturbation approach in the performance of explanation methods. Our qualitative analysis shows the capacity of the modified explanation methods to demarcate the manipulated image regions more accurately, and thus to provide more useful explanations.

Problem

Research questions and friction points this paper is trying to address.

Enhancing deepfake detector explanations

Using adversarial samples for feature importance

Improving accuracy in manipulated region identification

Innovation

Methods, ideas, or system contributions that make the work stand out.

Adversarially-generated samples enhance explanations

Natural Evolution Strategies flip detector decisions

Modified perturbation methods improve deepfake detection

🔎 Similar Papers

No similar papers found.

Authors to Follow