🤖 AI Summary
This work addresses the scarcity of high-quality, multimodal, and temporally synchronized training data that hinders the advancement of intelligent surgical robots. To this end, we present a multimodal data acquisition framework supporting both offline and online synchronization, integrating a clinical-grade stereo endoscope, lateral-view cameras, and capacitive tactile sensors, all implemented on the dVRK platform to enable high-fidelity, synchronized recording of visual, kinematic, and contact force data. We introduce a novel dual-mode synchronization mechanism and leverage techniques including stereo vision, Gaussian heatmap-based kinematic reprojection, depth estimation, and optical flow to construct a dataset comprising 214 validated multitask surgical manipulation instances performed on real tissue across varying skill levels. This dataset effectively facilitates the training of surgical skill assessment networks, and all hardware, software, and data are publicly released.
📝 Abstract
Most existing robotic surgery systems adopt a human-in-the-loop paradigm, often with the surgeon directly teleoperating the robotic system. Adding intelligence to these robots would enable higher-level control, such as supervised autonomy or even full autonomy. However, artificial intelligence (AI) requires large amounts of training data, which is currently lacking. This work proposes SurgSync, a multi-modal data collection framework with offline and online synchronization to support training and real-time inference, respectively. The framework is implemented on a da Vinci Research Kit (dVRK) and introduces (1) dual-mode (online/offline-matching) synchronized recorders, (2) a modern stereo endoscope to achieve image quality on par with clinical systems, and (3) additional sensors such as a side-view camera and a novel capacitive contact sensor to provide ground truth contact data. The framework also incorporates a post-processing toolbox for tasks such as depth estimation, optical flow, and a practical kinematic reprojection method using Gaussian heatmap. User studies with participants of varying skill levels are performed with ex-vivo tissue to provide clinically realistic data, and a network for surgical skill assessment is employed to demonstrate utilization of the collected data. Through the user study experiments, we obtained a dataset of 214 validated instances across multiple canonical training tasks. All software and data are available at surgsync.github.io.