🤖 AI Summary
In warehouse stacking scenarios, conventional vision-based grasping methods lack physical reasoning capabilities, leading to frequent collisions and structural collapses. To address this, we propose a grasp planning framework that tightly integrates visual perception with rigid-body dynamics: it reconstructs an approximate 3D scene from a single RGB-D image and—novelty first—embeds NVIDIA PhysX real-time dynamic simulation into a monocular + depth-driven planning pipeline. Our method establishes a dual-path decision mechanism combining heuristic search with physics-based validation, enabling joint detection and avoidance of both collisions and collapses. Experiments on real-world cluttered bin shelves demonstrate a 37% improvement in grasp success rate and a 2.1× increase in task efficiency over pure vision baselines. The core contribution lies in pioneering the deep integration of real-time physics simulation into end-to-end grasping planning, thereby bridging the gap between perception and physics-aware manipulation.
📝 Abstract
Efficient and safe retrieval of stacked objects in warehouse environments is a significant challenge due to complex spatial dependencies and structural inter-dependencies. Traditional vision-based methods excel at object localization but often lack the physical reasoning required to predict the consequences of extraction, leading to unintended collisions and collapses. This paper proposes a collapse and collision aware grasp planner that integrates dynamic physics simulations for robotic decision-making. Using a single image and depth map, an approximate 3D representation of the scene is reconstructed in a simulation environment, enabling the robot to evaluate different retrieval strategies before execution. Two approaches 1) heuristic-based and 2) physics-based are proposed for both single-box extraction and shelf clearance tasks. Extensive real-world experiments on structured and unstructured box stacks, along with validation using datasets from existing databases, show that our physics-aware method significantly improves efficiency and success rates compared to baseline heuristics.