Pillar-0: A New Frontier for Radiology Foundation Models

📅 2025-11-21

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

To address the growing disparity between radiologist workforce expansion and imaging volume growth, the limitations of existing foundation models in effectively modeling 3D medical imaging (e.g., via 2D slice reduction and consequent loss of grayscale contrast information), and the lack of clinically oriented evaluation, this study proposes: (1) RATE—a first-of-its-kind LLM-driven radiology structured annotation framework enabling high-accuracy automated labeling of 366 imaging findings; and (2) Pillar-0—a 3D-aware self-supervised pretraining model built upon a large-scale 3D CNN architecture that preserves critical grayscale information. Pillar-0 was pretrained on 155,000 CT (abdominal, thoracic, cranial) and breast MRI volumes. In multicenter internal validation, it achieved mean AUROC of 86.4–90.1%, outperforming baselines (MedGemma, Merlin) by 7.8–15.8 percentage points; external validation surpassed Stanford Merlin. For intracranial hemorrhage detection, it attained >95 AUROC using only 5% of training data; for lung cancer risk prediction, its C-index exceeded Sybil 3.0 by 3.0.

Technology Category

Application Category

📝 Abstract

Radiology plays an integral role in modern medicine, yet rising imaging volumes have far outpaced workforce growth. Foundation models offer a path toward assisting with the full spectrum of radiology tasks, but existing medical models remain limited: they process volumetric CT and MRI as low-fidelity 2D slices, discard critical grayscale contrast information, and lack evaluation frameworks that reflect real clinical practice. We introduce Pillar-0, a radiology foundation model pretrained on 42,990 abdomen-pelvis CTs, 86,411 chest CTs, 14,348 head CTs, and 11,543 breast MRIs from a large academic center, together with RATE, a scalable framework that extracts structured labels for 366 radiologic findings with near-perfect accuracy using LLMs. Across internal test sets of 14,230 abdomen-pelvis CTs, 10,646 chest CTs, 4,906 head CTs, and 1,585 breast MRIs, Pillar-0 establishes a new performance frontier, achieving mean AUROCs of 86.4, 88.0, 90.1, and 82.9, outperforming MedGemma (Google), MedImageInsight (Microsoft), Lingshu (Alibaba), and Merlin (Stanford) by 7.8-15.8 AUROC points and ranking best in 87.2% (319/366) tasks. Pillar-0 similarly outperforms all baselines in an external validation on the Stanford Abdominal CT dataset, including Merlin (82.2 vs 80.6 AUROC). Pillar-0 extends to tasks beyond its pretraining, such as long-horizon lung cancer risk prediction, where it improves upon the state-of-the-art Sybil by 3.0 C-index points on NLST, and generalizes with gains of 5.9 (MGH) and 1.9 (CGMH). In brain hemorrhage detection, Pillar-0 obtained a >95 AUROC when using only 1/20th of the data of the next most sample efficient baseline. Pillar-0 and RATE together provide an open, clinically rigorous foundation for building high-performance radiology systems, enabling applications that were previously infeasible due to computational, data, and evaluation constraints.

Problem

Research questions and friction points this paper is trying to address.

Existing medical models process volumetric CT/MRI as low-fidelity 2D slices

Current models discard critical grayscale contrast information in medical imaging

Lack of evaluation frameworks that reflect real clinical practice in radiology

Innovation

Methods, ideas, or system contributions that make the work stand out.

Processes volumetric CT/MRI scans in 3D

Preserves critical grayscale contrast information

Uses LLMs to extract structured labels automatically

🔎 Similar Papers

No similar papers found.

Authors to Follow