Towards user-centered interactive medical image segmentation in VR with an assistive AI agent

📅 2025-05-12
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Medical image segmentation is labor-intensive, error-prone, and requires specialized expertise. To address these challenges, this paper introduces SAMIRA—the first user-centered, interactive segmentation framework integrating virtual reality (VR) with radiology-oriented AI foundation models (e.g., MedSAM). SAMIRA innovatively employs VR-based voice-driven 3D localization and true-scale volumetric visualization to establish a human-in-the-loop workflow. It proposes a clinically grounded VR voice interaction paradigm and a multimodal input comparison framework—evaluating eye gaze, head pose, and controller inputs under near-field/far-field attention switching. Furthermore, it implements a hybrid segmentation mechanism jointly driven by point-based prompt fine-tuning and semantic understanding. A user study demonstrates high usability (SUS = 90.0 ± 9.0) and low cognitive load, validating SAMIRA’s significant value in clinical guidance, AI-human collaboration, and radiological technician training.

Technology Category

Application Category

📝 Abstract
Crucial in disease analysis and surgical planning, manual segmentation of volumetric medical scans (e.g. MRI, CT) is laborious, error-prone, and challenging to master, while fully automatic algorithms can benefit from user-feedback. Therefore, with the complementary power of the latest radiological AI foundation models and virtual reality (VR)'s intuitive data interaction, we propose SAMIRA, a novel conversational AI agent that assists users with localizing, segmenting, and visualizing 3D medical concepts in VR. Through speech-based interaction, the agent helps users understand radiological features, locate clinical targets, and generate segmentation masks that can be refined with just a few point prompts. The system also supports true-to-scale 3D visualization of segmented pathology to enhance patient-specific anatomical understanding. Furthermore, to determine the optimal interaction paradigm under near-far attention-switching for refining segmentation masks in an immersive, human-in-the-loop workflow, we compare VR controller pointing, head pointing, and eye tracking as input modes. With a user study, evaluations demonstrated a high usability score (SUS=90.0 $pm$ 9.0), low overall task load, as well as strong support for the proposed VR system's guidance, training potential, and integration of AI in radiological segmentation tasks.
Problem

Research questions and friction points this paper is trying to address.

Manual medical scan segmentation is laborious and error-prone
AI-assisted VR system improves 3D medical segmentation and visualization
Optimal VR interaction modes for refining segmentation masks are compared
Innovation

Methods, ideas, or system contributions that make the work stand out.

AI agent assists 3D medical segmentation in VR
Speech-based interaction refines segmentation with prompts
Compares VR input modes for optimal interaction
🔎 Similar Papers
No similar papers found.