Astra: a generalizable report generation foundation model for 3D computed tomography

📅 2026-05-29
📈 Citations: 0
Influential: 0
📄 PDF

career value

155K/year
🤖 AI Summary
This work addresses the absence of generalizable foundation models for CT report generation, which has hindered consistent diagnostic accuracy and stylistic coherence across multiple anatomical regions and institutions. The study presents the first foundation model tailored for 3D CT report generation, trained on 90,678 chest-abdomen CT–report pairs. By integrating large-scale vision–language pretraining, standardized report styling, reinforcement learning optimization, and multi-organ abnormality modeling, the model effectively mitigates heterogeneity in terminology and reporting style. It achieves an average 44.1% improvement in fine-grained diagnostic metrics across CTRgDB and six external cohorts. Clinical validation demonstrates a 29.6% increase in efficiency for chest report drafting and an 11.3% gain in abdominal report completeness, while also enabling seamless adaptation to diverse downstream AI tasks.
📝 Abstract
CT interpretation requires radiologists to review hundreds of volumetric slices per examination, making reporting time-consuming and highly expertise-dependent. Automated CT report generation offers a promising route to improving clinical efficiency, yet the field still lacks a generalizable CT report generation foundation model that supports multi-region reporting and remains robust across external real-world cohorts. Intrinsic inconsistencies in reporting style and diagnostic terminology across cohorts make naive joint training prone to noisy textual supervision, thereby limiting model generalizability. Here we present Astra, a generalizable CT report generation foundation model trained on 90,678 thoracoabdominal CT-report pairs (CTRgDB) with 353,671 abnormalities spanning eight organ systems. By harmonizing report style and further refining diagnostic consistency via reinforcement learning, Astra achieves style-consistent and diagnostically accurate report generation across diverse anatomical regions and institutions. Evaluating on CTRgDB and six external cohorts, Astra achieves state-of-the-art performance with a 44.1% average improvement in fine-grained diagnostic metrics (P<0.001). In real-world clinical workflows, Astra assistance accelerates chest report drafting by 29.6% and improves abdominal report completeness by 11.3% (P<0.001). Furthermore, Astra also demonstrates broad utility as a foundation for CT AI development, improving downstream diagnostic performance and scaling vision-language pretrain through high-quality report synthesis. Overall, Astra serves as a broadly accessible clinical assistant and a pivotal infrastructure for the next generation of AI-powered healthcare.
Problem

Research questions and friction points this paper is trying to address.

CT report generation
generalizable foundation model
multi-region reporting
diagnostic consistency
real-world cohorts
Innovation

Methods, ideas, or system contributions that make the work stand out.

foundation model
CT report generation
reinforcement learning
multi-region generalization
vision-language pretraining
Z
Zhuhao Wang
School of Biomedical Engineering, Tsinghua University.
F
Fang Chen
School of Biomedical Engineering, Shanghai Jiao Tong University.
Chaohui Yu
Chaohui Yu
Alibaba DAMO Academy
Computer VisionAIGC
Zihan Li
Zihan Li
University of Washington
Foundation ModelAI for HealthcareMultimodal Learning
Y
Yuchao Zheng
School of Biomedical Engineering, Tsinghua University.
J
Jing Wang
DAMO Academy, Alibaba Group; Hupan Laboratory.
X
Xuan Yang
Department of Biomedical Engineering, National University of Singapore.
J
Jia Guo
School of Biomedical Engineering, Tsinghua University.
Z
Zhenlu Yang
Department of Radiology, Guizhou Provincial People’s Hospital.
X
Xingju Zheng
Department of Radiology, Guizhou Provincial People’s Hospital.
Y
Yihua Sun
School of Biomedical Engineering, Tsinghua University.
H
Haojie Han
School of Biomedical Engineering, Tsinghua University.
X
Xiaoxiao Qin
Department of Radiology, The First Affiliated Hospital, Zhejiang University School of Medicine.
Z
Zhan Feng
Department of Radiology, The First Affiliated Hospital, Zhejiang University School of Medicine.
W
Wenbo Xiao
Department of Radiology, The First Affiliated Hospital, Zhejiang University School of Medicine.
C
Chao Zhu
Department of Radiology, Shanghai Sixth People’s Hospital Affiliated to Shanghai Jiao Tong University School of Medicine.
Y
Yuehua Li
Department of Radiology, Shanghai Sixth People’s Hospital Affiliated to Shanghai Jiao Tong University School of Medicine.
Shipeng Zhang
Shipeng Zhang
Assistant Professor, The Hong Kong Polytechnic University
Hao Luo
Hao Luo
Alibaba DAMO Academy
computer vision
Yunsong Peng
Yunsong Peng
Guizhou Provincial People's Hospital
Deep learningmedical image
Fan Wang
Fan Wang
Alibaba DAMO Academy
Computer VisionMachine Learning
H
Hongen Liao
School of Biomedical Engineering, Tsinghua University; School of Biomedical Engineering, Shanghai Jiao Tong University.