TRACE: Learning 3D Gaussian Physical Dynamics from Multi-view Videos

📅 2025-08-13
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This paper addresses the problem of jointly modeling the geometry, appearance, and physical dynamics of 3D scenes solely from unlabeled multi-view dynamic videos. Methodologically, it explicitly represents each 3D point as a particle with size, orientation, and rigid-body motion properties, and for the first time directly learns its full translational–rotational dynamical parameters—rather than implicit motion representations—within a differentiable, physics-guided framework built upon 3D Gaussian splatting, integrating neural rendering with physically grounded loss terms. The key contribution is an end-to-end framework that estimates dynamical parameters without any motion annotations, enabling unsupervised modeling of complex physical behaviors and automatic part-level object segmentation. Experiments demonstrate significant improvements in future-frame extrapolation accuracy across multiple dynamic video datasets, and show that clustering learned physical parameters yields high-quality, fully unsupervised object segmentation.

Technology Category

Application Category

📝 Abstract
In this paper, we aim to model 3D scene geometry, appearance, and physical information just from dynamic multi-view videos in the absence of any human labels. By leveraging physics-informed losses as soft constraints or integrating simple physics models into neural nets, existing works often fail to learn complex motion physics, or doing so requires additional labels such as object types or masks. We propose a new framework named TRACE to model the motion physics of complex dynamic 3D scenes. The key novelty of our method is that, by formulating each 3D point as a rigid particle with size and orientation in space, we directly learn a translation rotation dynamics system for each particle, explicitly estimating a complete set of physical parameters to govern the particle's motion over time. Extensive experiments on three existing dynamic datasets and one newly created challenging synthetic datasets demonstrate the extraordinary performance of our method over baselines in the task of future frame extrapolation. A nice property of our framework is that multiple objects or parts can be easily segmented just by clustering the learned physical parameters.
Problem

Research questions and friction points this paper is trying to address.

Model 3D scene geometry and physics from unlabeled videos
Learn complex motion physics without additional human labels
Estimate physical parameters for particle motion dynamics
Innovation

Methods, ideas, or system contributions that make the work stand out.

Learns 3D Gaussian physical dynamics from videos
Models particles with size and orientation
Segments objects via physical parameter clustering
🔎 Similar Papers
No similar papers found.
Jinxi Li
Jinxi Li
PhD candidate, The Hong Kong Polytechnic University
3d visiondynamic reconstructionspatial-temporal learning
Z
Ziyang Song
vLAR Group, The Hong Kong Polytechnic University
B
Bo Yang
vLAR Group, The Hong Kong Polytechnic University