🤖 AI Summary
Existing robotic manipulation benchmarks are largely confined to simplified tabletop settings, failing to evaluate complex spatial reasoning capabilities—such as handling densely stacked, multi-height, and closely spaced objects—required in real-world retail environments. To address this gap, we introduce RoboBenchMart, the first comprehensive robotic manipulation benchmark tailored to dark-store retail scenarios. Our approach features: (1) procedural generation of diverse store layouts; (2) an end-to-end trajectory synthesis pipeline with standardized evaluation metrics; and (3) integrated multimodal perception and fine-grained control interfaces. Experimental results demonstrate that current state-of-the-art general-purpose models exhibit significant performance deficits on representative retail tasks—including grocery picking and obstacle-aware grasping. RoboBenchMart is fully open-sourced, providing a unified, reproducible testbed to advance practical robotic deployment in real-world retail settings.
📝 Abstract
Most existing robotic manipulation benchmarks focus on simplified tabletop scenarios, typically involving a stationary robotic arm interacting with various objects on a flat surface. To address this limitation, we introduce RoboBenchMart, a more challenging and realistic benchmark designed for dark store environments, where robots must perform complex manipulation tasks with diverse grocery items. This setting presents significant challenges, including dense object clutter and varied spatial configurations -- with items positioned at different heights, depths, and in close proximity. By targeting the retail domain, our benchmark addresses a setting with strong potential for near-term automation impact. We demonstrate that current state-of-the-art generalist models struggle to solve even common retail tasks. To support further research, we release the RoboBenchMart suite, which includes a procedural store layout generator, a trajectory generation pipeline, evaluation tools and fine-tuned baseline models.