SimuScene: Simulation-Ready Compositional 3D Scene Reconstruction from a Single Image

📅 2026-06-02
📈 Citations: 0
Influential: 0
📄 PDF

career value

182K/year
🤖 AI Summary
Existing single-image 3D scene reconstruction methods often fail in physical simulation due to object interpenetration, floating, or sinking. This work proposes a compositional reconstruction framework that, for the first time, integrates a physics engine into the joint optimization of shape and layout. By diagnostically simulating gravity during generation, the approach converts physical violations into quantitative feedback signals to dynamically refine geometry and support relationships. The method introduces physics-driven gravity-axis stretching, shape resampling in occluded regions, and a closed-loop optimization mechanism, achieving global physical consistency without post-processing. Experiments demonstrate that the proposed approach achieves state-of-the-art performance on metrics of physical stability and geometric alignment, and successfully enables downstream tasks such as humanoid robot control and robotic arm manipulation.
📝 Abstract
Reconstructing interactive, simulation-ready 3D scenes from a single image is a critical bottleneck for robotic manipulation. While recent single-image lifters recover plausible per-object shapes, composing them yields scenes that collapse under physical simulation due to interpenetrating, hovering, or sinking objects. Existing physics-aware methods address this strictly as a post-hoc layout correction, leaving the underlying geometric errors unresolved. To address this, we introduce SimuScene, a compositional 3D reconstruction pipeline that puts physics in the loop of shape and layout estimation. Rather than using physics merely for layout cleanup, we utilize the physics engine as a diagnostic measurement tool during the generative process itself. By diagnostically simulating reconstructed objects under gravity, we convert penetration and support failures into quantitative correction signals that drive gravity-axis stretching and amodal shape resampling. This physics-informed feedback loop mitigates accumulated reconstruction errors and produces a stable, simulation-ready compositional 3D scene. Extensive experiments demonstrate state-of-the-art performance on physical stability and geometric alignment benchmarks. We further highlight SimuScene's utility by deploying reconstructed environments in humanoid control and robot-arm manipulation tasks.
Problem

Research questions and friction points this paper is trying to address.

3D scene reconstruction
physical simulation
single-image lifting
geometric errors
object interpenetration
Innovation

Methods, ideas, or system contributions that make the work stand out.

physics-in-the-loop
simulation-ready reconstruction
compositional 3D scene
diagnostic simulation
amodal shape resampling