Learning 3D-Gaussian Simulators from RGB Videos

📅 2025-03-31

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

This work addresses the challenge of learning physically plausible 3D dynamics directly from multi-view RGB videos, without explicit articulation constraints or ground-truth 3D supervision. The proposed method encodes input videos into a dynamic 3D Gaussian particle representation and models point-wise latent-space dynamics via a spatiotemporally encoded Transformer. It jointly optimizes motion and illumination under an inverse rendering objective and renders high-fidelity frames using 3D Gaussian splatting. Crucially, the framework implicitly learns per-particle physical attributes—including mass, elasticity, and friction—enabling unified simulation of rigid bodies, elastic deformables, and cloth-like materials. This implicit physical parameterization significantly improves generalization to unseen multi-body interactions and novel scene editing, while preserving realistic lighting effects. Experiments demonstrate enhanced simulatability, editability, and physical fidelity compared to prior learning-based approaches.

Technology Category

Application Category

📝 Abstract

Learning physics simulations from video data requires maintaining spatial and temporal consistency, a challenge often addressed with strong inductive biases or ground-truth 3D information -- limiting scalability and generalization. We introduce 3DGSim, a 3D physics simulator that learns object dynamics end-to-end from multi-view RGB videos. It encodes images into a 3D Gaussian particle representation, propagates dynamics via a transformer, and renders frames using 3D Gaussian splatting. By jointly training inverse rendering with a dynamics transformer using a temporal encoding and merging layer, 3DGSimembeds physical properties into point-wise latent vectors without enforcing explicit connectivity constraints. This enables the model to capture diverse physical behaviors, from rigid to elastic and cloth-like interactions, along with realistic lighting effects that also generalize to unseen multi-body interactions and novel scene edits.

Problem

Research questions and friction points this paper is trying to address.

Learning 3D physics simulations from RGB videos without 3D ground truth

Maintaining spatiotemporal consistency without explicit connectivity constraints

Capturing diverse physical behaviors and realistic lighting effects

Innovation

Methods, ideas, or system contributions that make the work stand out.

Encodes images into 3D Gaussian particles

Propagates dynamics via transformer architecture

Renders frames using 3D Gaussian splatting

🔎 Similar Papers

No similar papers found.

Authors to Follow