SHANDS: A Multi-View Dataset and Benchmark for Surgical Hand-Gesture and Error Recognition Toward Medical Training

📅 2026-03-27

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

This work addresses the high cost and limited scalability of expert-dependent assessment in surgical training, primarily hindered by the absence of high-quality datasets capturing real trainee errors and multi-view variability. To bridge this gap, we introduce SHANDS, a large-scale, multi-view surgical hand motion dataset comprising incision and suturing tasks performed by 52 participants. SHANDS provides frame-level gesture annotations and clinically validated labels for eight error categories. Collected via five synchronized RGB cameras and evaluated under a standardized protocol, it uniquely integrates multi-view video, fine-grained gesture primitives, and a structured error taxonomy. As the first publicly available resource of its kind, SHANDS fills a critical data void for AI-driven surgical skill assessment and enables the development of robust, scalable intelligent evaluation systems.

📝 Abstract

In surgical training for medical students, proficiency development relies on expert-led skill assessment, which is costly, time-limited, difficult to scale, and its expertise remains confined to institutions with available specialists. Automated AI-based assessment offers a viable alternative, but progress is constrained by the lack of datasets containing realistic trainee errors and the multi-view variability needed to train robust computer vision approaches. To address this gap, we present Surgical-Hands (SHands), a large-scale multi-view video dataset for surgical hand-gesture and error recognition for medical training. \textsc{SHands} captures linear incision and suturing using five RGB cameras from complementary viewpoints, performed by 52 participants (20 experts and 32 trainees), each completing three standardized trials per procedure. The videos are annotated at the frame level with 15 gesture primitives and include a validated taxonomy of 8 trainee error types, enabling both gesture recognition and error detection. We further define standardized evaluation protocols for single-view, multi-view, and cross-view generalization, and benchmark state-of-the-art deep learning models on the dataset. SHands is publicly released to support the development of robust and scalable AI systems for surgical training grounded in clinically curated domain knowledge.

Problem

Research questions and friction points this paper is trying to address.

surgical training

hand-gesture recognition

error recognition

multi-view dataset

AI-based assessment

Innovation

Methods, ideas, or system contributions that make the work stand out.

multi-view dataset

surgical gesture recognition

trainee error detection

frame-level annotation

surgical training benchmark

🔎 Similar Papers

No similar papers found.

Authors to Follow