FreeGave: 3D Physics Learning from Dynamic Videos by Gaussian Velocity

๐Ÿ“… 2025-06-09
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
This paper addresses the unsupervised joint reconstruction of 3D scene geometry, appearance, and physical motion solely from multi-view dynamic videosโ€”without requiring object masks, category labels, or other priors. We propose a Gaussian particle velocity field modeling framework that integrates differentiable Gaussian splatting neural rendering, implicit physics encoding, and divergence-free regularization on the vector field, eliminating inefficient physics-informed neural network (PINN) losses to enable end-to-end differentiable physical motion learning. Our key contribution is the first incorporation of divergence-free constraints into unsupervised 3D physical modeling, enabling the physics encoder to inherently capture interpretable 3D motion semantics. Evaluated on three public benchmarks and a newly collected real-world dataset, our method achieves significant improvements in future-frame extrapolation (+12.7% PSNR) and motion segmentation (+18.3% mIoU).

Technology Category

Application Category

๐Ÿ“ Abstract
In this paper, we aim to model 3D scene geometry, appearance, and the underlying physics purely from multi-view videos. By applying various governing PDEs as PINN losses or incorporating physics simulation into neural networks, existing works often fail to learn complex physical motions at boundaries or require object priors such as masks or types. In this paper, we propose FreeGave to learn the physics of complex dynamic 3D scenes without needing any object priors. The key to our approach is to introduce a physics code followed by a carefully designed divergence-free module for estimating a per-Gaussian velocity field, without relying on the inefficient PINN losses. Extensive experiments on three public datasets and a newly collected challenging real-world dataset demonstrate the superior performance of our method for future frame extrapolation and motion segmentation. Most notably, our investigation into the learned physics codes reveals that they truly learn meaningful 3D physical motion patterns in the absence of any human labels in training.
Problem

Research questions and friction points this paper is trying to address.

Model 3D scene geometry and physics from videos
Learn complex physics without object priors
Estimate velocity fields without inefficient PINN losses
Innovation

Methods, ideas, or system contributions that make the work stand out.

Estimates per-Gaussian velocity field
Uses divergence-free module design
Learns physics without human labels
๐Ÿ”Ž Similar Papers
No similar papers found.
Jinxi Li
Jinxi Li
PhD candidate, The Hong Kong Polytechnic University
3d visiondynamic reconstructionspatial-temporal learning
Z
Ziyang Song
vLAR Group, The Hong Kong Polytechnic University
S
Siyuan Zhou
vLAR Group, The Hong Kong Polytechnic University
B
Bo Yang
vLAR Group, The Hong Kong Polytechnic University