🤖 AI Summary
This paper addresses the challenges of authenticity verification and training-data provenance attribution for generative adversarial network (GAN)-synthesized media. Methodologically, it proposes an interpretable deepfake forensics framework that jointly exploits frequency-domain features (DCT/FFT), local texture descriptors (SIFT), and color-distribution statistics (RGB/YUV histograms), integrated with ensemble classifiers such as Random Forest, to achieve both binary forgery detection and multi-source training-set attribution (e.g., CelebA, FFHQ). Crucially, it establishes a novel linkage between spectral artifacts and implicit regularization traces induced during GAN training, while embedding legal accountability logic to enhance judicial interpretability. Evaluated on five state-of-the-art GAN architectures—including StyleGAN—the framework achieves 98–99% binary classification accuracy and high multi-source attribution precision, with frequency-domain features contributing over 76% to overall discriminative performance.
📝 Abstract
Synthetic media generated by Generative Adversarial Networks (GANs) pose significant challenges in verifying authenticity and tracing dataset origins, raising critical concerns in copyright enforcement, privacy protection, and legal compliance. This paper introduces a novel forensic framework for identifying the training dataset (e.g., CelebA or FFHQ) of GAN-generated images through interpretable feature analysis. By integrating spectral transforms (Fourier/DCT), color distribution metrics, and local feature descriptors (SIFT), our pipeline extracts discriminative statistical signatures embedded in synthetic outputs. Supervised classifiers (Random Forest, SVM, XGBoost) achieve 98-99% accuracy in binary classification (real vs. synthetic) and multi-class dataset attribution across diverse GAN architectures (StyleGAN, AttGAN, GDWCT, StarGAN, and StyleGAN2). Experimental results highlight the dominance of frequency-domain features (DCT/FFT) in capturing dataset-specific artifacts, such as upsampling patterns and spectral irregularities, while color histograms reveal implicit regularization strategies in GAN training. We further examine legal and ethical implications, showing how dataset attribution can address copyright infringement, unauthorized use of personal data, and regulatory compliance under frameworks like GDPR and California's AB 602. Our framework advances accountability and governance in generative modeling, with applications in digital forensics, content moderation, and intellectual property litigation.