CP4D: Compositional Physics-aware 4D Scene Generation

📅 2026-06-08
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing 4D generation methods struggle to simultaneously achieve physical consistency and visual plausibility. This work proposes a novel paradigm grounded in scene compositionality, decomposing a 4D scene into a static 3D environment and dynamic foreground objects governed by physical laws. The approach employs a three-stage pipeline to synthesize high-fidelity, interactive 4D content: first, high-quality 3D assets are generated using pretrained expert models; second, a hybrid motion synthesis strategy integrates physics-based simulation with video diffusion priors; and third, an automated mechanism seamlessly composes the static and dynamic components. The resulting 4D scenes substantially outperform existing methods in visual fidelity, physical plausibility, and fine-grained controllability.
📝 Abstract
4D generation (\textit{i.e.}, dynamic 3D generation) has recently emerged as a rapidly growing research frontier due to its powerful spatiotemporal modeling capabilities. However, despite notable advances, existing approaches typically fail to capture the underlying physical principles, producing results that are both physically inconsistent and visually implausible. To overcome this limitation, we present CP4D, a novel paradigm for photorealistic 4D scene synthesis with faithful adherence to complex physical dynamics. Drawing inspiration from the compositional nature of real-world scenes, where immutable static backgrounds coexist with dynamic, physically plausible foregrounds, CP4D reformulates 4D generation as the integration of a static 3D environment with physically grounded dynamic objects. On this basis, our framework follows a three-stage pipeline: \textbf{1)} Firstly, we leverage pre-trained expert models to generate high-fidelity 3D representations of the environment and foreground objects respectively. \textbf{2)} Subsequently, to produce physically plausible trajectories and realistic interactions for these objects, we propose a hybrid motion synthesis strategy that integrates priors from physical simulators with the common sense embedded in video diffusion models. \textbf{3)} Finally, we develop an automated composition mechanism that seamlessly fuses the static environment and dynamic objects into coherent, physically consistent 4D scenes. Extensive experiments demonstrate that CP4D can generate explorable and interactive 4D scenes with high visual fidelity, strong physical plausibility, and fine-grained controllability, significantly outperforming existing methods. The project page: https://anonymous.4open.science/w/CP4D/.
Problem

Research questions and friction points this paper is trying to address.

4D generation
physical plausibility
scene synthesis
spatiotemporal modeling
dynamic 3D
Innovation

Methods, ideas, or system contributions that make the work stand out.

4D generation
physics-aware synthesis
compositional modeling
hybrid motion synthesis
physical plausibility
🔎 Similar Papers
No similar papers found.