🤖 AI Summary
This work reveals that real-world post-processing techniques—such as super-resolution—severely degrade the performance of mainstream deepfake detectors, with AUC dropping by up to 78% and accuracy approaching 50% (i.e., random guessing). To bridge the gap between academic evaluation and industrial deployment, we introduce and publicly release the first real-world faceswap benchmark dataset curated from online face-swapping platforms. Methodologically, we systematically investigate the failure mechanism via real-data collection and annotation, multi-dimensional qualitative/quantitative analysis, modeling of post-processing effects, and a cross-model robustness evaluation framework. Our key contributions are: (1) the first empirical demonstration that post-processing is a primary cause of detection failure; (2) the first publicly available faceswap dataset reflecting practical deployment conditions; and (3) a reproducible, realistic evaluation paradigm for deepfake detection.
📝 Abstract
Deepfakes, particularly those involving faceswap-based manipulations, have sparked significant societal concern due to their increasing realism and potential for misuse. Despite rapid advancements in generative models, detection methods have not kept pace, creating a critical gap in defense strategies. This disparity is further amplified by the disconnect between academic research and real-world applications, which often prioritize different objectives and evaluation criteria. In this study, we take a pivotal step toward bridging this gap by presenting a novel observation: the post-processing step of super-resolution, commonly employed in real-world scenarios, substantially undermines the effectiveness of existing deepfake detection methods. To substantiate this claim, we introduce and publish the first real-world faceswap dataset, collected from popular online faceswap platforms. We then qualitatively evaluate the performance of state-of-the-art deepfake detectors on real-world deepfakes, revealing that their accuracy approaches the level of random guessing. Furthermore, we quantitatively demonstrate the significant performance degradation caused by common post-processing techniques. By addressing this overlooked challenge, our study underscores a critical avenue for enhancing the robustness and practical applicability of deepfake detection methods in real-world settings.