When Deepfake Detection Meets Graph Neural Network:a Unified and Lightweight Learning Framework

📅 2025-08-07
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing video forgery detection methods rely on isolated modal cues—spatial, temporal, or spectral—leading to poor generalization and parameter-heavy models. To address this, we propose a lightweight graph neural network framework that, for the first time, unifies spatial-spectral-temporal inconsistency modeling in the graph domain. Our approach constructs a structured graph representation of videos and jointly learns spectral filtering and temporal differencing operations within the graph architecture, enabling end-to-end joint inference without reliance on large pretrained models. Extensive experiments demonstrate state-of-the-art performance both in-domain and cross-domain across multiple benchmarks. Notably, our method reduces model parameters by up to 42.4× compared to prior works, significantly improving robustness against unseen manipulations and computational efficiency for real-world deployment.

Technology Category

Application Category

📝 Abstract
The proliferation of generative video models has made detecting AI-generated and manipulated videos an urgent challenge. Existing detection approaches often fail to generalize across diverse manipulation types due to their reliance on isolated spatial, temporal, or spectral information, and typically require large models to perform well. This paper introduces SSTGNN, a lightweight Spatial-Spectral-Temporal Graph Neural Network framework that represents videos as structured graphs, enabling joint reasoning over spatial inconsistencies, temporal artifacts, and spectral distortions. SSTGNN incorporates learnable spectral filters and temporal differential modeling into a graph-based architecture, capturing subtle manipulation traces more effectively. Extensive experiments on diverse benchmark datasets demonstrate that SSTGNN not only achieves superior performance in both in-domain and cross-domain settings, but also offers strong robustness against unseen manipulations. Remarkably, SSTGNN accomplishes these results with up to 42.4$ imes$ fewer parameters than state-of-the-art models, making it highly lightweight and scalable for real-world deployment.
Problem

Research questions and friction points this paper is trying to address.

Detect AI-generated videos across diverse manipulation types
Overcome limitations of isolated spatial, temporal, or spectral analysis
Reduce model size while maintaining detection performance
Innovation

Methods, ideas, or system contributions that make the work stand out.

Lightweight Spatial-Spectral-temporal Graph Neural Network
Joint reasoning over spatial-temporal-spectral inconsistencies
Learnable spectral filters and temporal differential modeling
🔎 Similar Papers
No similar papers found.
H
Haoyu Liu
Nanyang Technological University, Singapore
Chaoyu Gong
Chaoyu Gong
NTU
M
Mengke He
Nanyang Technological University, Singapore
Jiate Li
Jiate Li
University of Southern California
K
Kai Han
The University of Hong Kong, Hong Kong SAR
Siqiang Luo
Siqiang Luo
Assistant Professor of Nanyang Technological University
databasegraph data managementkey-value stores