PhysChoreo: Physics-Controllable Video Generation with Part-Aware Semantic Grounding

📅 2025-11-25
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Current video generation models achieve high visual fidelity but lack physical controllability and plausibility—particularly for precise, long-horizon control of complex dynamics. To address this, we propose a two-stage physically controllable video generation framework. In the first stage, part-aware reconstruction enables fine-grained estimation of static physical attributes (e.g., mass, coefficient of friction). In the second stage, a differentiable physics engine is tightly coupled with temporal instruction guidance to enable editable, physically grounded dynamic simulation. Our method integrates semantic localization, attribute estimation, and physics-driven modeling, significantly improving temporal coherence and physical realism. Evaluated on multiple benchmarks, our approach surpasses state-of-the-art methods, generating videos that simultaneously exhibit high visual fidelity, rich dynamic controllability, and strict adherence to physical laws.

Technology Category

Application Category

📝 Abstract
While recent video generation models have achieved significant visual fidelity, they often suffer from the lack of explicit physical controllability and plausibility. To address this, some recent studies attempted to guide the video generation with physics-based rendering. However, these methods face inherent challenges in accurately modeling complex physical properties and effectively control ling the resulting physical behavior over extended temporal sequences. In this work, we introduce PhysChoreo, a novel framework that can generate videos with diverse controllability and physical realism from a single image. Our method consists of two stages: first, it estimates the static initial physical properties of all objects in the image through part-aware physical property reconstruction. Then, through temporally instructed and physically editable simulation, it synthesizes high-quality videos with rich dynamic behaviors and physical realism. Experimental results show that PhysChoreo can generate videos with rich behaviors and physical realism, outperforming state-of-the-art methods on multiple evaluation metrics.
Problem

Research questions and friction points this paper is trying to address.

Generating physically controllable videos from single images
Modeling complex physical properties in video synthesis
Ensuring long-term physical plausibility in generated sequences
Innovation

Methods, ideas, or system contributions that make the work stand out.

Two-stage framework for physics-controllable video generation
Part-aware physical property reconstruction from single image
Temporally instructed physically editable simulation synthesis
🔎 Similar Papers
No similar papers found.
H
Haoze Zhang
Harbin Institute of Technology
T
Tianyu Huang
Harbin Institute of Technology
Z
Zichen Wan
Harbin Institute of Technology
X
Xiaowei Jin
Harbin Institute of Technology
Hongzhi Zhang
Hongzhi Zhang
Professor of Computer Science and Technology, Harbin Institute of Technology
Deep LearningArtificial IntelligenceComputer Vision
H
Hui Li
Harbin Institute of Technology
Wangmeng Zuo
Wangmeng Zuo
School of Computer Science and Technology, Harbin Institute of Technology
Computer VisionImage ProcessingGenerative AIDeep LearningBiometrics