🤖 AI Summary
Medical image segmentation is labor-intensive, error-prone, and requires specialized expertise. To address these challenges, this paper introduces SAMIRA—the first user-centered, interactive segmentation framework integrating virtual reality (VR) with radiology-oriented AI foundation models (e.g., MedSAM). SAMIRA innovatively employs VR-based voice-driven 3D localization and true-scale volumetric visualization to establish a human-in-the-loop workflow. It proposes a clinically grounded VR voice interaction paradigm and a multimodal input comparison framework—evaluating eye gaze, head pose, and controller inputs under near-field/far-field attention switching. Furthermore, it implements a hybrid segmentation mechanism jointly driven by point-based prompt fine-tuning and semantic understanding. A user study demonstrates high usability (SUS = 90.0 ± 9.0) and low cognitive load, validating SAMIRA’s significant value in clinical guidance, AI-human collaboration, and radiological technician training.
📝 Abstract
Crucial in disease analysis and surgical planning, manual segmentation of volumetric medical scans (e.g. MRI, CT) is laborious, error-prone, and challenging to master, while fully automatic algorithms can benefit from user-feedback. Therefore, with the complementary power of the latest radiological AI foundation models and virtual reality (VR)'s intuitive data interaction, we propose SAMIRA, a novel conversational AI agent that assists users with localizing, segmenting, and visualizing 3D medical concepts in VR. Through speech-based interaction, the agent helps users understand radiological features, locate clinical targets, and generate segmentation masks that can be refined with just a few point prompts. The system also supports true-to-scale 3D visualization of segmented pathology to enhance patient-specific anatomical understanding. Furthermore, to determine the optimal interaction paradigm under near-far attention-switching for refining segmentation masks in an immersive, human-in-the-loop workflow, we compare VR controller pointing, head pointing, and eye tracking as input modes. With a user study, evaluations demonstrated a high usability score (SUS=90.0 $pm$ 9.0), low overall task load, as well as strong support for the proposed VR system's guidance, training potential, and integration of AI in radiological segmentation tasks.