🤖 AI Summary
This work addresses the challenges of triaging and classifying high-throughput CT images, which are hindered by the complexity of 3D anatomical structures, variability in scanning protocols, and noisy report annotations—factors that limit existing vision-language models in balancing performance and interpretability. To overcome these limitations, the authors propose ORACLE-CT, a framework featuring an encoder-agnostic, organ-aware head that integrates organ mask attention with lightweight volumetric and Hounsfield Unit (HU) scalar features. This design enables accurate classification and generation of localized evidence under a unified evaluation protocol. Evaluated on the CT-RATE chest dataset (AUROC 0.86) and the abdominal MERLIN dataset encompassing 30 pathological findings (AUROC 0.85), ORACLE-CT outperforms both supervised and zero-shot vision-language models, achieving state-of-the-art performance in cross-anatomical-region CT classification.
📝 Abstract
There is an urgent need for triage and classification of high-volume medical imaging modalities such as computed tomography (CT), which can improve patient care and mitigate radiologist burnout. Study-level CT triage requires calibrated predictions with localized evidence; however, off-the-shelf Vision Language Models (VLM) struggle with 3D anatomy, protocol shifts, and noisy report supervision. This study used the two largest publicly available chest CT datasets: CT-RATE and RADCHEST-CT (held-out external test set). Our carefully tuned supervised baseline (instantiated as a simple Global Average Pooling head) establishes a new supervised state of the art, surpassing all reported linear-probe VLMs. Building on this baseline, we present ORACLE-CT, an encoder-agnostic, organ-aware head that pairs Organ-Masked Attention (mask-restricted, per-organ pooling that yields spatial evidence) with Organ-Scalar Fusion (lightweight fusion of normalized volume and mean-HU cues). In the chest setting, ORACLE-CT masked attention model achieves AUROC 0.86 on CT-RATE; in the abdomen setting, on MERLIN (30 findings), our supervised baseline exceeds a reproduced zero-shot VLM baseline obtained by running publicly released weights through our pipeline, and adding masked attention plus scalar fusion further improves performance to AUROC 0.85. Together, these results deliver state-of-the-art supervised classification performance across both chest and abdomen CT under a unified evaluation protocol. The source code is available at https://github.com/lavsendahal/oracle-ct.