VoD: Learning Volume of Differences for Video-Based Deepfake Detection

📅 2025-03-10

📈 Citations: 0

✨ Influential: 0

career value

250K/year

🤖 AI Summary

To address weak cross-dataset generalization and insufficient spatiotemporal inconsistency modeling in Deepfake video detection, this paper proposes a novel representation and learning paradigm based on the “Volume of Differences” (VoD). Methodologically, it extracts spatiotemporal inconsistencies via consecutive frame differencing and introduces a progressive dilated convolutional network, integrated with multi-scale temporal sampling and adjustable-segment voxelization for robust dynamic forgery trace modeling. The core contribution is the first structured formulation of inter-frame differences as a 3D voxel representation, coupled with a stepwise network expansion strategy to enhance transferability. Evaluated on multiple mainstream Deepfake benchmarks, the method achieves state-of-the-art performance, improving average cross-dataset detection accuracy by 6.2%. Ablation studies confirm the effectiveness and synergistic gains of each component.

Technology Category

Application Category

📝 Abstract

The rapid development of deep learning and generative AI technologies has profoundly transformed the digital contact landscape, creating realistic Deepfake that poses substantial challenges to public trust and digital media integrity. This paper introduces a novel Deepfake detention framework, Volume of Differences (VoD), designed to enhance detection accuracy by exploiting temporal and spatial inconsistencies between consecutive video frames. VoD employs a progressive learning approach that captures differences across multiple axes through the use of consecutive frame differences (CFD) and a network with stepwise expansions. We evaluate our approach with intra-dataset and cross-dataset testing scenarios on various well-known Deepfake datasets. Our findings demonstrate that VoD excels with the data it has been trained on and shows strong adaptability to novel, unseen data. Additionally, comprehensive ablation studies examine various configurations of segment length, sampling steps, and intervals, offering valuable insights for optimizing the framework. The code for our VoD framework is available at https://github.com/xuyingzhongguo/VoD.

Problem

Research questions and friction points this paper is trying to address.

Detects Deepfakes by analyzing temporal and spatial inconsistencies in videos.

Improves detection accuracy using a progressive learning approach.

Evaluates adaptability to new data and optimizes framework configurations.

Innovation

Methods, ideas, or system contributions that make the work stand out.

Volume of Differences (VoD) for Deepfake detection

Progressive learning with consecutive frame differences

Stepwise expansion network for temporal-spatial analysis

🔎 Similar Papers

No similar papers found.

Apple

San Diego, United States of America

AI Research Scientist, Computer Vision - Facebook Video Intelligence