🤖 AI Summary
This work addresses the challenge of tumor localization in the bladder, where the absence of stable anatomical landmarks necessitates the use of vascular patterns as personalized navigation fingerprints. However, automated vessel segmentation in endoscopic images is severely hindered by artifacts, illumination variations, tissue deformation, and dynamic mucosal folds. To overcome these issues, the authors propose a Hybrid Attention-Convolution (HAC) network that integrates Transformer-based modeling of global vascular topology with CNN-based residual refinement to recover fine structures. They further introduce a structure-aware annotation refinement strategy to suppress spurious short terminal branches and employ a physics-informed, clinically grounded self-supervised pretraining scheme to mitigate label scarcity. Evaluated on the BlaVeS dataset, the method achieves 0.94 accuracy, 0.61 precision, and 0.66 clDice, significantly outperforming existing approaches while effectively reducing false positives caused by dynamic folds, thereby providing robust vascular structural stability for intraoperative navigation.
📝 Abstract
Urinary bladder cancer surveillance requires tracking tumor sites across repeated interventions, yet the deformable and hollow bladder lacks stable landmarks for orientation. While blood vessels visible during endoscopy offer a patient-specific"vascular fingerprint"for navigation, automated segmentation is challenged by imperfect endoscopic data, including sparse labels, artifacts like bubbles or variable lighting, continuous deformation, and mucosal folds that mimic vessels. State-of-the-art vessel segmentation methods often fail to address these domain-specific complexities. We introduce a Hybrid Attention-Convolution (HAC) architecture that combines Transformers to capture global vessel topology prior with a CNN that learns a residual refinement map to precisely recover thin-vessel details. To prioritize structural connectivity, the Transformer is trained on optimized ground truth data that exclude short and terminal branches. Furthermore, to address data scarcity, we employ a physics-aware pretraining, that is a self-supervised strategy using clinically grounded augmentations on unlabeled data. Evaluated on the BlaVeS dataset, consisting of endoscopic video frames, our approach achieves high accuracy (0.94) and superior precision (0.61) and clDice (0.66) compared to state-of-the-art medical segmentation models. Crucially, our method successfully suppresses false positives from mucosal folds that dynamically appear and vanish as the bladder fills and empties during surgery. Hence, HAC provides the reliable structural stability required for clinical navigation.