Less is More? Revisiting the Importance of Frame Rate in Real-Time Zero-Shot Surgical Video Segmentation

📅 2025-02-28
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study investigates the trade-off between frame rate and segmentation performance in real-time zero-shot surgical video segmentation. We identify a counterintuitive phenomenon wherein SAM2 achieves higher offline mIoU at low frame rates (1 FPS) than at high frame rates (25 FPS), revealing a misalignment between conventional offline evaluation and clinical deployment requirements. To address this, we propose a redefined real-time evaluation paradigm jointly grounded in *streaming inference latency*, *temporal mask consistency*, and *surgeon clinical preferences*. Methodologically, we develop a zero-shot transfer framework built upon SAM2, incorporating a multi-granularity frame sampling strategy and introducing novel quantitative metrics for streaming latency and mask jitter. Evaluated on a cholecystectomy dataset: (1) offline mIoU improves by 3.2% at 1 FPS; (2) at 25 FPS, temporal jitter reduces by 67% under streaming conditions, and 92% of expert surgeons prefer its overlay visualization. This work is the first to expose and characterize the frame-rate–performance evaluation mismatch, advancing the design of clinically viable real-time segmentation systems toward dual-driven optimization—real-time streaming fidelity and human-centered usability.

Technology Category

Application Category

📝 Abstract
Real-time video segmentation is a promising feature for AI-assisted surgery, providing intraoperative guidance by identifying surgical tools and anatomical structures. However, deploying state-of-the-art segmentation models, such as SAM2, in real-time settings is computationally demanding, which makes it essential to balance frame rate and segmentation performance. In this study, we investigate the impact of frame rate on zero-shot surgical video segmentation, evaluating SAM2's effectiveness across multiple frame sampling rates for cholecystectomy procedures. Surprisingly, our findings indicate that in conventional evaluation settings, frame rates as low as a single frame per second can outperform 25 FPS, as fewer frames smooth out segmentation inconsistencies. However, when assessed in a real-time streaming scenario, higher frame rates yield superior temporal coherence and stability, particularly for dynamic objects such as surgical graspers. Finally, we investigate human perception of real-time surgical video segmentation among professionals who work closely with such data and find that respondents consistently prefer high FPS segmentation mask overlays, reinforcing the importance of real-time evaluation in AI-assisted surgery.
Problem

Research questions and friction points this paper is trying to address.

Impact of frame rate on zero-shot surgical video segmentation.
Balancing frame rate and segmentation performance in real-time settings.
Human perception preference for high FPS in surgical video segmentation.
Innovation

Methods, ideas, or system contributions that make the work stand out.

Evaluates SAM2 at various frame rates
Low frame rates reduce segmentation inconsistencies
High frame rates enhance temporal coherence
🔎 Similar Papers
No similar papers found.
Utku Ozbulak
Utku Ozbulak
Research Professor at Ghent University
Trustworthy AIMedical imagingBiomedical imagingSelf-supervised learning
Seyed Amir Mousavi
Seyed Amir Mousavi
PhD. Sturdent, Ghent University
Computer VisionNatural Language ProcessingMachine Learning
F
Francesca Tozzi
Department of GI Surgery, Ghent University Hospital, Ghent, Belgium; Department of Human Structure and Repair, Ghent University, Ghent, Belgium
N
Nikdokht Rashidian
Department of Human Structure and Repair, Ghent University, Ghent, Belgium
Wouter Willaert
Wouter Willaert
Hoofddocent anatomie, universiteit gent
peritoneal metastasessurgical education
W
W. D. Neve
Center for Biosystems and Biotech Data Science, Ghent University Global Campus, Incheon, Republic of Korea; IDLab, ELIS, Ghent University, Ghent, Belgium
J
J. Vankerschaver
Department of Applied Mathematics, Computer Science and Statistics, Ghent University, Ghent, Belgium