VideoPCDNet: Video Parsing and Prediction with Phase Correlation Networks

📅 2025-06-24

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

This work addresses unsupervised video understanding and forecasting in dynamic scenes. We propose an object-centric modeling framework grounded in phase correlation within the frequency domain. By recursively analyzing inter-frame phase relationships, our method explicitly disentangles object prototypes and models their geometric transformations—enabling fully unsupervised object decomposition, motion inference, and future-frame prediction. The core innovation lies in integrating frequency-domain phase correlation with lightweight learnable modules to construct interpretable and trackable object representations. Evaluated on multiple synthetic benchmarks, our approach substantially outperforms existing object-centric models in unsupervised object tracking and video prediction, while demonstrating superior generalization capability and computational efficiency.

Technology Category

Application Category

📝 Abstract

Understanding and predicting video content is essential for planning and reasoning in dynamic environments. Despite advancements, unsupervised learning of object representations and dynamics remains challenging. We present VideoPCDNet, an unsupervised framework for object-centric video decomposition and prediction. Our model uses frequency-domain phase correlation techniques to recursively parse videos into object components, which are represented as transformed versions of learned object prototypes, enabling accurate and interpretable tracking. By explicitly modeling object motion through a combination of frequency domain operations and lightweight learned modules, VideoPCDNet enables accurate unsupervised object tracking and prediction of future video frames. In our experiments, we demonstrate that VideoPCDNet outperforms multiple object-centric baseline models for unsupervised tracking and prediction on several synthetic datasets, while learning interpretable object and motion representations.

Problem

Research questions and friction points this paper is trying to address.

Unsupervised object-centric video decomposition and prediction

Frequency-domain phase correlation for object tracking

Learning interpretable object and motion representations

Innovation

Methods, ideas, or system contributions that make the work stand out.

Unsupervised object-centric video decomposition framework

Frequency-domain phase correlation for object parsing

Combines frequency operations with learned modules

🔎 Similar Papers

VideoPrism: A Foundational Visual Encoder for Video Understanding

2024-02-20International Conference on Machine LearningCitations: 30

Authors to Follow