🤖 AI Summary
This work addresses the real-time pose estimation of slender, wire-like instruments—including transesophageal echocardiography (TOE) probes, prosthetic heart valves, and radiofrequency ablation catheters—in X-ray fluoroscopic images during cardiovascular interventional procedures. We propose a lightweight, domain-specific detection framework that integrates multi-scale Gaussian derivative filtering with a dot-product attention mechanism to enhance the representation of thin metallic structures in low-contrast X-ray imagery. Built upon a CNN backbone, the architecture incorporates customized filters and attention modules, trained using an IoU-optimized loss. Evaluated on 12,438 clinical X-ray frames, the method achieves mean IoU of 0.88 for TOE probes and 0.87 for prosthetic valves; detection rates reach 99.8% for 10-electrode mapping catheters and 97.8% for ablation catheters, at 58 FPS inference speed. The approach delivers high accuracy, low latency, and strong generalizability—establishing a robust visual perception foundation for image-guided intervention.
📝 Abstract
Objective: Interventional devices, catheters and insertable imaging devices such as transesophageal echo (TOE) probes are routinely used in minimally invasive cardiovascular procedures. Detecting their positions and orientations in X-ray fluoroscopic images is important for many clinical applications. Method: In this paper, a novel attention mechanism was designed to guide a convolution neural network (CNN) model to the areas of wires in X-ray images, as nearly all interventional devices and catheters used in cardiovascular procedures contain wires. The attention mechanism includes multi-scale Gaussian derivative filters and a dot-product-based attention layer. By utilizing the proposed attention mechanism, a lightweight foundation model can be created to detect multiple objects simultaneously with higher precision and real-time speed. Results: The proposed model was trained and tested on a total of 12,438 X-ray images. An accuracy of 0.88 was achieved for detecting an echo probe and 0.87 for detecting an artificial valve at 58 FPS. The accuracy was measured by intersection-over-union (IoU). We also achieved a 99.8% success rate in detecting a 10-electrode catheter and a 97.8% success rate in detecting an ablation catheter. Conclusion: Our detection foundation model can simultaneously detect and identify both interventional devices and flexible catheters in real-time X-ray fluoroscopic images. Significance: The proposed model employs a novel attention mechanism to achieve high-performance object detection, making it suitable for various clinical applications and robotic-assisted surgeries. Codes are available at https://github.com/YingLiangMa/AttWire.