🤖 AI Summary
This study addresses the challenge of inferring and extrapolating complex physical dynamics—such as those of deformable solids and fluids—from video observations. To this end, the authors construct a 2D physics simulation dataset based on the Material Point Method (MPM) and present the first systematic comparison between code generation models and video diffusion models on physical dynamics inference tasks. Experimental results demonstrate that code generation models produce temporally stable and physically consistent extrapolations, whereas video diffusion models, while adept at capturing geometric details, often generate extrapolations lacking physical plausibility. The work reveals complementary strengths of the two approaches: code generation excels in parameter inference and physical consistency, while diffusion models are more effective at geometric recognition. This study establishes a new benchmark and offers key insights for physics-aware video understanding.
📝 Abstract
To study the ability to infer physical dynamics from videos and extrapolate them forward in time, we assemble a dataset of 2D Material Point Method (MPM) physical simulations covering rich physical phenomena such as deformable objects, fluids, kinetic objects, and emitters. We study code generation and video diffusion approaches on this dataset, identifying their strengths and weaknesses by varying the amount of physically relevant side information. The code generation model, beyond giving a working demonstration of automatic synthesis of MPM simulations, reveals that such an approach struggles with inferring physical parameters from visual input, but relative to video diffusion, produces physically and temporally stable extrapolations forward in time, while the video diffusion model more strongly identifies geometric properties from visual input but produces physically implausible extrapolations.