SurgPLAN++: Universal Surgical Phase Localization Network for Online and Offline Inference

📅 2024-09-19

🏛️ arXiv.org

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

Existing surgical phase recognition methods predominantly perform online frame-level classification, lacking global temporal modeling—resulting in temporally incoherent predictions and inability to support offline clinical review requiring full-video analysis. To address this, we propose TPL-Net, a Temporal Phase Localization Network, the first framework to model surgical phases as continuous temporal segments rather than isolated frames, enabling unified online and offline operation. Methodologically: (1) we formulate surgical phase recognition as an end-to-end temporal object detection task; (2) we introduce a pseudo-full-video augmentation strategy to maintain online inference efficiency; and (3) we design a global proposal generation and iterative refinement mechanism to enhance offline accuracy. Evaluated on multiple benchmarks, TPL-Net consistently outperforms state-of-the-art methods in both online and offline settings, yielding significantly more temporally coherent phase predictions and more precise phase boundary localization.

Technology Category

Application Category

📝 Abstract

Surgical phase recognition is critical for assisting surgeons in understanding surgical videos. Existing studies focused more on online surgical phase recognition, by leveraging preceding frames to predict the current frame. Despite great progress, they formulated the task as a series of frame-wise classification, which resulted in a lack of global context of the entire procedure and incoherent predictions. Moreover, besides online analysis, accurate offline surgical phase recognition is also in significant clinical need for retrospective analysis, and existing online algorithms do not fully analyze the entire video, thereby limiting accuracy in offline analysis. To overcome these challenges and enhance both online and offline inference capabilities, we propose a universal Surgical Phase Localization Network, named SurgPLAN++, with the principle of temporal detection. To ensure a global understanding of the surgical procedure, we devise a phase localization strategy for SurgPLAN++ to predict phase segments across the entire video through phase proposals. For online analysis, to generate high-quality phase proposals, SurgPLAN++ incorporates a data augmentation strategy to extend the streaming video into a pseudo-complete video through mirroring, center-duplication, and down-sampling. For offline analysis, SurgPLAN++ capitalizes on its global phase prediction framework to continuously refine preceding predictions during each online inference step, thereby significantly improving the accuracy of phase recognition. We perform extensive experiments to validate the effectiveness, and our SurgPLAN++ achieves remarkable performance in both online and offline modes, which outperforms state-of-the-art methods. The source code is available at https://github.com/franciszchen/SurgPLAN-Plus.

Problem

Research questions and friction points this paper is trying to address.

Enhances surgical phase recognition accuracy.

Addresses global context in surgical videos.

Improves both online and offline inference.

Innovation

Methods, ideas, or system contributions that make the work stand out.

Temporal detection principle

Phase localization strategy

Data augmentation technique

🔎 Similar Papers

No similar papers found.

Authors to Follow