π€ AI Summary
Real-time interactive grasping of dynamic objects faces the fundamental challenge of balancing low latency with promptability. To address this, we propose SPGraspβa novel framework that pioneers the adaptation of SAMv2 to dynamic grasping tasks. SPGrasp introduces a spatiotemporal prompting mechanism that jointly integrates user-provided prompts and video temporal context within an end-to-end inference pipeline, ensuring both temporal consistency and interactive flexibility. Leveraging lightweight video modeling and explicit inter-frame association, the framework achieves high-accuracy, low-latency grasp estimation. Quantitatively, SPGrasp attains 90.6% and 93.8% instance-level grasp accuracy on OCID and Jacquard, respectively; achieves 92.0% accuracy in continuous grasping tracking on GraspNet-1Billion with only 73.1 ms per-frame latency; and demonstrates 94.8% success rate on 13 moving objects in real-world scenarios.
π Abstract
Real-time interactive grasp synthesis for dynamic objects remains challenging as existing methods fail to achieve low-latency inference while maintaining promptability. To bridge this gap, we propose SPGrasp (spatiotemporal prompt-driven dynamic grasp synthesis), a novel framework extending segment anything model v2 (SAMv2) for video stream grasp estimation. Our core innovation integrates user prompts with spatiotemporal context, enabling real-time interaction with end-to-end latency as low as 59 ms while ensuring temporal consistency for dynamic objects. In benchmark evaluations, SPGrasp achieves instance-level grasp accuracies of 90.6% on OCID and 93.8% on Jacquard. On the challenging GraspNet-1Billion dataset under continuous tracking, SPGrasp achieves 92.0% accuracy with 73.1 ms per-frame latency, representing a 58.5% reduction compared to the prior state-of-the-art promptable method RoG-SAM while maintaining competitive accuracy. Real-world experiments involving 13 moving objects demonstrate a 94.8% success rate in interactive grasping scenarios. These results confirm SPGrasp effectively resolves the latency-interactivity trade-off in dynamic grasp synthesis. Code is available at https://github.com/sejmoonwei/SPGrasp.