🤖 AI Summary
To address low fidelity and poor controllability in 3D reconstruction of surgical instruments from monocular surgical videos, this paper proposes the first instrument-level controllable 3D reconstruction framework tailored for Real2Sim applications. Methodologically, we introduce a novel geometry pretraining strategy that binds Gaussian point clouds with part-wise meshes; integrate forward kinematics modeling to enable joint-level controllable deformation; and design a semantic-embedding Gaussian rendering-contrastive pose tracking scheme that jointly optimizes real-time pose and joint states. Evaluated on six real surgical videos—including both public and in-house datasets—our method achieves photorealistic rendering and millimeter-level geometric accuracy (mean error <1.2 mm), significantly outperforming NeRF-based and generic point-cloud approaches. This work establishes a high-fidelity, physics-aware, and actuation-ready 3D foundation for surgical AI simulation and training.
📝 Abstract
Real2Sim is becoming increasingly important with the rapid development of surgical artificial intelligence (AI) and autonomy. In this work, we propose a novel Real2Sim methodology, extit{Instrument-Splatting}, that leverages 3D Gaussian Splatting to provide fully controllable 3D reconstruction of surgical instruments from monocular surgical videos. To maintain both high visual fidelity and manipulability, we introduce a geometry pre-training to bind Gaussian point clouds on part mesh with accurate geometric priors and define a forward kinematics to control the Gaussians as flexible as real instruments. Afterward, to handle unposed videos, we design a novel instrument pose tracking method leveraging semantics-embedded Gaussians to robustly refine per-frame instrument poses and joint states in a render-and-compare manner, which allows our instrument Gaussian to accurately learn textures and reach photorealistic rendering. We validated our method on 2 publicly released surgical videos and 4 videos collected on ex vivo tissues and green screens. Quantitative and qualitative evaluations demonstrate the effectiveness and superiority of the proposed method.