Beyond Flicker: Detecting Kinematic Inconsistencies for Generalizable Deepfake Video Detection

📅 2025-12-03
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Deepfake detection models often exhibit poor generalization to unseen manipulation methods. Method: This paper proposes a generalizable detection framework grounded in kinematic inconsistency. Its core innovation is the introduction of “facial motion bases,” wherein an autoencoder decomposes facial landmark trajectories to explicitly model and manipulate natural biomechanical dependencies across facial regions; anomalous motions are then injected into real videos via facial warping, generating synthetic forgeries with physiologically implausible dynamics. This approach moves beyond conventional reliance on inter-frame flickering artifacts, compelling detectors to learn more robust motion-consistency features. Contribution/Results: The method achieves state-of-the-art generalization performance across multiple benchmarks, significantly improving detection accuracy on previously unseen deepfake generation techniques.

Technology Category

Application Category

📝 Abstract
Generalizing deepfake detection to unseen manipulations remains a key challenge. A recent approach to tackle this issue is to train a network with pristine face images that have been manipulated with hand-crafted artifacts to extract more generalizable clues. While effective for static images, extending this to the video domain is an open issue. Existing methods model temporal artifacts as frame-to-frame instabilities, overlooking a key vulnerability: the violation of natural motion dependencies between different facial regions. In this paper, we propose a synthetic video generation method that creates training data with subtle kinematic inconsistencies. We train an autoencoder to decompose facial landmark configurations into motion bases. By manipulating these bases, we selectively break the natural correlations in facial movements and introduce these artifacts into pristine videos via face morphing. A network trained on our data learns to spot these sophisticated biomechanical flaws, achieving state-of-the-art generalization results on several popular benchmarks.
Problem

Research questions and friction points this paper is trying to address.

Detects deepfake videos by identifying unnatural facial motion inconsistencies
Generalizes detection to unseen manipulations using synthetic training data
Addresses limitations of static image methods in video domain
Innovation

Methods, ideas, or system contributions that make the work stand out.

Synthetic video generation with kinematic inconsistencies
Autoencoder decomposes landmarks into manipulable motion bases
Training data introduces biomechanical flaws via face morphing
🔎 Similar Papers
No similar papers found.
A
Alejandro Cobo
Universidad Politécnica de Madrid, Campus de Montegancedo s/n, Boadilla del Monte, 28660, Madrid, Spain
R
Roberto Valle
Universidad Politécnica de Madrid, Campus de Montegancedo s/n, Boadilla del Monte, 28660, Madrid, Spain
J
José Miguel Buenaposada
Universidad Rey Juan Carlos, Calle Tulipán s/n, Móstoles, 28933, Madrid, Spain
Luis Baumela
Luis Baumela
Departamento de Inteligencia Artificial, Universidad Politécnica de Madrid
computer vision