VideoPCDNet: Video Parsing and Prediction with Phase Correlation Networks

📅 2025-06-24
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses unsupervised video understanding and forecasting in dynamic scenes. We propose an object-centric modeling framework grounded in phase correlation within the frequency domain. By recursively analyzing inter-frame phase relationships, our method explicitly disentangles object prototypes and models their geometric transformations—enabling fully unsupervised object decomposition, motion inference, and future-frame prediction. The core innovation lies in integrating frequency-domain phase correlation with lightweight learnable modules to construct interpretable and trackable object representations. Evaluated on multiple synthetic benchmarks, our approach substantially outperforms existing object-centric models in unsupervised object tracking and video prediction, while demonstrating superior generalization capability and computational efficiency.

Technology Category

Application Category

📝 Abstract
Understanding and predicting video content is essential for planning and reasoning in dynamic environments. Despite advancements, unsupervised learning of object representations and dynamics remains challenging. We present VideoPCDNet, an unsupervised framework for object-centric video decomposition and prediction. Our model uses frequency-domain phase correlation techniques to recursively parse videos into object components, which are represented as transformed versions of learned object prototypes, enabling accurate and interpretable tracking. By explicitly modeling object motion through a combination of frequency domain operations and lightweight learned modules, VideoPCDNet enables accurate unsupervised object tracking and prediction of future video frames. In our experiments, we demonstrate that VideoPCDNet outperforms multiple object-centric baseline models for unsupervised tracking and prediction on several synthetic datasets, while learning interpretable object and motion representations.
Problem

Research questions and friction points this paper is trying to address.

Unsupervised object-centric video decomposition and prediction
Frequency-domain phase correlation for object tracking
Learning interpretable object and motion representations
Innovation

Methods, ideas, or system contributions that make the work stand out.

Unsupervised object-centric video decomposition framework
Frequency-domain phase correlation for object parsing
Combines frequency operations with learned modules
🔎 Similar Papers
2024-02-20International Conference on Machine LearningCitations: 30
N
Noel José Rodrigues Vicente
Autonomous Intelligent Systems group, University of Bonn, Germany
E
Enrique Lehner
Autonomous Intelligent Systems group, University of Bonn, Germany
Angel Villar-Corrales
Angel Villar-Corrales
PhD Student, University of Bonn
Deep LearningMachine LearningComputer Vision
J
Jan Nogga
Autonomous Intelligent Systems group, University of Bonn, Germany; Lamarr Institute for Machine Learning and Artificial Intelligence; Center for Robotics, University of Bonn, Germany
Sven Behnke
Sven Behnke
Professor for Autonomous Intelligent Systems, Computer Science Institute, University of Bonn
RoboticsArtificial IntelligenceComputer VisionHumanoid RobotsMicro Air Vehicles