🤖 AI Summary
Stable multi-object placement—requiring interpenetration-free configurations, precise contact modeling, and static force equilibrium—remains challenging in complex scenes. To address this, we propose a physics-guided diffusion generative framework: (1) an offline physics-based sampling planner generates multimodal stable placement labels to train a point-cloud-conditioned diffusion model as a joint geometric-physical prior; (2) leveraging the composability of score-based generative models, we incorporate a stability-aware loss during sampling—enabling plug-and-play enhancement without retraining. Evaluated on four benchmark scenes, our method improves robustness against strong external perturbations by 56% and reduces inference latency by 47% over state-of-the-art geometric approaches, while significantly enhancing both placement quality and computational efficiency.
📝 Abstract
Stably placing an object in a multi-object scene is a fundamental challenge in robotic manipulation, as placements must be penetration-free, establish precise surface contact, and result in a force equilibrium. To assess stability, existing methods rely on running a simulation engine or resort to heuristic, appearance-based assessments. In contrast, our approach integrates stability directly into the sampling process of a diffusion model. To this end, we query an offline sampling-based planner to gather multi-modal placement labels and train a diffusion model to generate stable placements. The diffusion model is conditioned on scene and object point clouds, and serves as a geometry-aware prior. We leverage the compositional nature of score-based generative models to combine this learned prior with a stability-aware loss, thereby increasing the likelihood of sampling from regions of high stability. Importantly, this strategy requires no additional re-training or fine-tuning, and can be directly applied to off-the-shelf models. We evaluate our method on four benchmark scenes where stability can be accurately computed. Our physics-guided models achieve placements that are 56% more robust to forceful perturbations while reducing runtime by 47% compared to a state-of-the-art geometric method.