🤖 AI Summary
To address insufficient generalization to unseen manipulations and cross-dataset scenarios in deepfake image detection, this paper proposes a Dual-Domain Convolutional Neural Network (Dual-Domain CNN) that jointly models spatial-domain local texture anomalies and frequency-domain global periodic inconsistencies. Methodologically, we introduce a novel Dual-Domain Feature Coupler and Dual Fourier Attention Module to enable content-adaptive cross-domain feature fusion; the end-to-end architecture integrates Fast Fourier Transform (FFT), depthwise separable convolutions, and an enhanced XceptionNet backbone. Evaluated on multiple benchmark datasets, our approach achieves state-of-the-art (SOTA) performance—particularly demonstrating显著 robustness under cross-dataset and previously unseen manipulation types—while maintaining real-time inference capability.
📝 Abstract
The increasing realism of content generated by GANs and diffusion models has made deepfake detection significantly more challenging. Existing approaches often focus solely on spatial or frequency-domain features, limiting their generalization to unseen manipulations. We propose the Spectral Cross-Attentional Network (SpecXNet), a dual-domain architecture for robust deepfake detection. The core extbf{Dual-Domain Feature Coupler (DDFC)} decomposes features into a local spatial branch for capturing texture-level anomalies and a global spectral branch that employs Fast Fourier Transform to model periodic inconsistencies. This dual-domain formulation allows SpecXNet to jointly exploit localized detail and global structural coherence, which are critical for distinguishing authentic from manipulated images. We also introduce the extbf{Dual Fourier Attention (DFA)} module, which dynamically fuses spatial and spectral features in a content-aware manner. Built atop a modified XceptionNet backbone, we embed the DDFC and DFA modules within a separable convolution block. Extensive experiments on multiple deepfake benchmarks show that SpecXNet achieves state-of-the-art accuracy, particularly under cross-dataset and unseen manipulation scenarios, while maintaining real-time feasibility. Our results highlight the effectiveness of unified spatial-spectral learning for robust and generalizable deepfake detection. To ensure reproducibility, we released the full code on href{https://github.com/inzamamulDU/SpecXNet}{ extcolor{blue}{ extbf{GitHub}}}.