PhysCtrl: Generative Physics for Controllable and Physics-Grounded Video Generation

📅 2025-09-24
📈 Citations: 1
Influential: 0
📄 PDF
🤖 AI Summary
Existing video generation models produce high-fidelity videos but often lack physical plausibility and 3D controllability. To address this, we propose a physics-anchored image-to-video generation framework. Our method introduces a generative physics network that explicitly models multi-material dynamics—including elastic bodies, granular media (e.g., sand), viscoelastic putty, and rigid bodies—alongside a spatiotemporal attention module to capture inter-particle interactions. We jointly optimize trajectory plausibility and visual quality via a composite loss incorporating physics-based constraints. Furthermore, we employ a diffusion model to synthesize physically consistent 3D point trajectories, which drive controllable video synthesis. Trained on 550K synthetic samples, our approach surpasses state-of-the-art methods in both physical plausibility and visual fidelity. It enables fine-grained dynamic editing guided by physical parameters (e.g., elasticity, friction) and external forces (e.g., gravity, impact), offering unprecedented control over physically grounded video generation.

Technology Category

Application Category

📝 Abstract
Existing video generation models excel at producing photo-realistic videos from text or images, but often lack physical plausibility and 3D controllability. To overcome these limitations, we introduce PhysCtrl, a novel framework for physics-grounded image-to-video generation with physical parameters and force control. At its core is a generative physics network that learns the distribution of physical dynamics across four materials (elastic, sand, plasticine, and rigid) via a diffusion model conditioned on physics parameters and applied forces. We represent physical dynamics as 3D point trajectories and train on a large-scale synthetic dataset of 550K animations generated by physics simulators. We enhance the diffusion model with a novel spatiotemporal attention block that emulates particle interactions and incorporates physics-based constraints during training to enforce physical plausibility. Experiments show that PhysCtrl generates realistic, physics-grounded motion trajectories which, when used to drive image-to-video models, yield high-fidelity, controllable videos that outperform existing methods in both visual quality and physical plausibility. Project Page: https://cwchenwang.github.io/physctrl
Problem

Research questions and friction points this paper is trying to address.

Overcoming lack of physical plausibility in video generation models
Addressing limited 3D controllability in existing video generation methods
Generating physics-grounded motion with parameter and force control
Innovation

Methods, ideas, or system contributions that make the work stand out.

Generative physics network with diffusion model
Spatiotemporal attention block for particle interactions
Physics-based constraints for physical plausibility
🔎 Similar Papers
No similar papers found.
C
Chen Wang
University of Pennsylvania
C
Chuhao Chen
University of Pennsylvania
Y
Yiming Huang
University of Pennsylvania
Z
Zhiyang Dou
MIT
Y
Yuan Liu
HKUST
Jiatao Gu
Jiatao Gu
UPenn CIS / Apple MLR
machine learninggenerative modelsnatural language processingcomputer visiondeep learning
Lingjie Liu
Lingjie Liu
Assistant Professor at UPenn
Computer GraphicsComputer VisionDeep Learning