Temporal Saliency-Guided Distillation: A Scalable Framework for Distilling Video Datasets

📅 2025-05-27
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Video datasets suffer from high dimensionality and temporal dynamics, leading to low distillation efficiency and poor dynamic fidelity. To address these challenges, this paper proposes an end-to-end video dataset distillation framework. Our method introduces two key innovations: (1) a novel temporal saliency-guided filtering mechanism that dynamically identifies critical motion cues via inter-frame differencing—departing from static image distillation paradigms; and (2) a single-layer differentiable architecture integrating gradient-driven optimization from pretrained models, inter-frame difference modeling, and a temporal saliency-weighted loss function. Evaluated on standard video benchmarks, our approach significantly narrows the training utility gap between distilled and original videos, achieving state-of-the-art performance while reducing computational overhead by over 40%, thereby enabling scalable deployment.

Technology Category

Application Category

📝 Abstract
Dataset distillation (DD) has emerged as a powerful paradigm for dataset compression, enabling the synthesis of compact surrogate datasets that approximate the training utility of large-scale ones. While significant progress has been achieved in distilling image datasets, extending DD to the video domain remains challenging due to the high dimensionality and temporal complexity inherent in video data. Existing video distillation (VD) methods often suffer from excessive computational costs and struggle to preserve temporal dynamics, as na""ive extensions of image-based approaches typically lead to degraded performance. In this paper, we propose a novel uni-level video dataset distillation framework that directly optimizes synthetic videos with respect to a pre-trained model. To address temporal redundancy and enhance motion preservation, we introduce a temporal saliency-guided filtering mechanism that leverages inter-frame differences to guide the distillation process, encouraging the retention of informative temporal cues while suppressing frame-level redundancy. Extensive experiments on standard video benchmarks demonstrate that our method achieves state-of-the-art performance, bridging the gap between real and distilled video data and offering a scalable solution for video dataset compression.
Problem

Research questions and friction points this paper is trying to address.

Extending dataset distillation to video domain is challenging due to high dimensionality and temporal complexity.
Existing video distillation methods have high computational costs and poor temporal dynamics preservation.
Proposing a temporal saliency-guided framework to reduce redundancy and enhance motion preservation in distilled videos.
Innovation

Methods, ideas, or system contributions that make the work stand out.

Temporal saliency-guided filtering mechanism
Uni-level video dataset distillation framework
Optimizes synthetic videos with pre-trained model
🔎 Similar Papers
No similar papers found.
Xulin Gu
Xulin Gu
Harbin Institute of Technology, Shenzhen
dataset distillation
Xinhao Zhong
Xinhao Zhong
Harbin Institute of Technology, Shenzhen
Data-centric AIEffiecient AI
Z
Zhixing Wei
Harbin Institute of Technology, Shenzhen
Y
Yimin Zhou
Tsinghua Shenzhen International Graduate School
Shuoyang Sun
Shuoyang Sun
Harbin Institute of Technology, Shenzhen
LLM3DAI Security
B
Bin Chen
Harbin Institute of Technology, Shenzhen, Peng Cheng Laboratory
Hongpeng Wang
Hongpeng Wang
Robotic Institute, nankai university
Intelligent Robotics、Artificial Intelligence
Y
Yuan Luo
Shanghai Jiao Tong University