Towards Dynamic 3D Reconstruction of Hand-Instrument Interaction in Ophthalmic Surgery

📅 2025-05-23

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

Current 3D reconstruction in ophthalmic microsurgery is hindered by the absence of large-scale real-world RGB-D datasets and high-precision annotation tools. To address this, we introduce OphNet-3D—the first large-scale dynamic 3D dataset for ophthalmic surgery—comprising 41 sequences and 7.1 million frames, covering 12 surgical phases, 10 instrument categories, and ground-truth MANO hand model annotations. We propose a multi-stage automated annotation pipeline integrating cross-view geometric consistency, biomechanical constraints, and collision-aware interaction modeling. Furthermore, we design two novel architectures—H-Net and OH-Net—that incorporate weak-perspective spatial reasoning and centralized collision representation. Evaluated on two benchmarks—bimanual hand pose estimation and hand-instrument joint reconstruction—our method achieves a >2 mm reduction in MPJPE and a 23% improvement in ADD-S, significantly enhancing accuracy and robustness in microsurgical scenarios.

Technology Category

Application Category

📝 Abstract

Accurate 3D reconstruction of hands and instruments is critical for vision-based analysis of ophthalmic microsurgery, yet progress has been hampered by the lack of realistic, large-scale datasets and reliable annotation tools. In this work, we introduce OphNet-3D, the first extensive RGB-D dynamic 3D reconstruction dataset for ophthalmic surgery, comprising 41 sequences from 40 surgeons and totaling 7.1 million frames, with fine-grained annotations of 12 surgical phases, 10 instrument categories, dense MANO hand meshes, and full 6-DoF instrument poses. To scalably produce high-fidelity labels, we design a multi-stage automatic annotation pipeline that integrates multi-view data observation, data-driven motion prior with cross-view geometric consistency and biomechanical constraints, along with a combination of collision-aware interaction constraints for instrument interactions. Building upon OphNet-3D, we establish two challenging benchmarks-bimanual hand pose estimation and hand-instrument interaction reconstruction-and propose two dedicated architectures: H-Net for dual-hand mesh recovery and OH-Net for joint reconstruction of two-hand-two-instrument interactions. These models leverage a novel spatial reasoning module with weak-perspective camera modeling and collision-aware center-based representation. Both architectures outperform existing methods by substantial margins, achieving improvements of over 2mm in Mean Per Joint Position Error (MPJPE) and up to 23% in ADD-S metrics for hand and instrument reconstruction, respectively.

Problem

Research questions and friction points this paper is trying to address.

Lack of realistic datasets for 3D hand-instrument reconstruction in ophthalmic surgery

Need for scalable annotation tools to generate high-fidelity surgical data labels

Challenges in accurate bimanual hand pose and hand-instrument interaction reconstruction

Innovation

Methods, ideas, or system contributions that make the work stand out.

Introduces OphNet-3D dataset for 3D reconstruction

Multi-stage auto-annotation pipeline with biomechanical constraints

H-Net and OH-Net models with spatial reasoning module

🔎 Similar Papers

No similar papers found.

Authors to Follow