Simulation-Ready Cluttered Scene Estimation via Physics-aware Joint Shape and Pose Optimization

📅 2026-02-23
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing methods struggle to efficiently and robustly reconstruct simulation-ready multi-object scenes in cluttered environments, often hindered by high computational costs and poor generalization. This work proposes an end-to-end joint optimization framework that simultaneously refines the shapes and poses of multiple rigid bodies under physical constraints by incorporating a globally differentiable contact model, followed by differentiable texture refinement to produce simulation-ready scenes. Leveraging the structural sparsity of the augmented Lagrangian Hessian matrix, the method employs an efficient solver combined with a learning-driven initialization strategy, significantly enhancing scalability and reconstruction robustness in complex scenes. Experiments demonstrate that the approach reliably recovers physically plausible geometries and poses—directly usable in simulation—even in highly cluttered settings involving up to five objects and 22 convex hulls.

Technology Category

Application Category

📝 Abstract
Estimating simulation-ready scenes from real-world observations is crucial for downstream planning and policy learning tasks. Regretfully, existing methods struggle in cluttered environments, often exhibiting prohibitive computational cost, poor robustness, and restricted generality when scaling to multiple interacting objects. We propose a unified optimization-based formulation for real-to-sim scene estimation that jointly recovers the shapes and poses of multiple rigid objects under physical constraints. Our method is built on two key technical innovations. First, we leverage the recently introduced shape-differentiable contact model, whose global differentiability permits joint optimization over object geometry and pose while modeling inter-object contacts. Second, we exploit the structured sparsity of the augmented Lagrangian Hessian to derive an efficient linear system solver whose computational cost scales favorably with scene complexity. Building on this formulation, we develop an end-to-end real-to-sim scene estimation pipeline that integrates learning-based object initialization, physics-constrained joint shape-pose optimization, and differentiable texture refinement. Experiments on cluttered scenes with up to 5 objects and 22 convex hulls demonstrate that our approach robustly reconstructs physically valid, simulation-ready object shapes and poses.
Problem

Research questions and friction points this paper is trying to address.

simulation-ready scene estimation
cluttered scenes
shape and pose optimization
physics-aware reconstruction
real-to-sim
Innovation

Methods, ideas, or system contributions that make the work stand out.

shape-differentiable contact model
joint shape-pose optimization
physics-aware simulation
structured sparsity
real-to-sim scene estimation
🔎 Similar Papers
No similar papers found.
W
Wei-Cheng Huang
Siebel School of Computing and Data Science, University of Illinois at Urbana-Champaign
J
Jiaheng Han
Siebel School of Computing and Data Science, University of Illinois at Urbana-Champaign
X
Xiaohan Ye
Department of Computer Science, The University of Hong Kong
Z
Zherong Pan
Meta Reality Labs
Kris Hauser
Kris Hauser
Professor, University of Illinois at Urbana-Champaign
RoboticsArtificial Intelligence