🤖 AI Summary
This study addresses the challenge of retrieving pediatric wrist X-rays with fractures, where subtle and focal clinical signs are often obscured by anatomical overlap and imaging angles, compounded by the scarcity of fine-grained annotations. The authors propose a region-aware, two-stage retrieval framework that operates without image-level labels. First, structured descriptions are extracted from radiology reports using MedGemma to train contrastive learning encoders capturing both global and local (distal radius, ulna, and ulnar styloid) representations. Then, coarse-to-fine retrieval is achieved by combining initial global matching with region-conditioned re-ranking. The method substantially improves performance: image-to-text Recall@5 increases from 0.82% to 9.35%, fracture classification AUROC reaches 0.949, regional F1-score rises from 0.568 to 0.753, and radiologist-rated clinical relevance improves from 3.36 to 4.35.
📝 Abstract
Retrieving wrist radiographs with analogous fracture patterns is challenging because clinically important cues are subtle, highly localized and often obscured by overlapping anatomy or variable imaging views. Progress is further limited by the scarcity of large, well-annotated datasets for case-based medical image retrieval. We introduce WristMIR, a region-aware pediatric wrist radiograph retrieval framework that leverages dense radiology reports and bone-specific localization to learn fine-grained, clinically meaningful image representations without any manual image-level annotations. Using MedGemma-based structured report mining to generate both global and region-level captions, together with pre-processed wrist images and bone-specific crops of the distal radius, distal ulna, and ulnar styloid, WristMIR jointly trains global and local contrastive encoders and performs a two-stage retrieval process: (1) coarse global matching to identify candidate exams, followed by (2) region-conditioned reranking aligned to a predefined anatomical bone region. WristMIR improves retrieval performance over strong vision-language baselines, raising image-to-text Recall@5 from 0.82% to 9.35%. Its embeddings also yield stronger fracture classification (AUROC 0.949, AUPRC 0.953). In region-aware evaluation, the two-stage design markedly improves retrieval-based fracture diagnosis, increasing mean $F_1$ from 0.568 to 0.753, and radiologists rate its retrieved cases as more clinically relevant, with mean scores rising from 3.36 to 4.35. These findings highlight the potential of anatomically guided retrieval to enhance diagnostic reasoning and support clinical decision-making in pediatric musculoskeletal imaging. The source code is publicly available at https://github.com/quin-med-harvard-edu/WristMIR.