When is 3D Worth It? A Resource-Performance Frontier for CNNs and Transformers in Lung CT

📅 2026-06-05
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study systematically evaluates whether the additional computational cost of 3D models is justified over 2D or 2.5D approaches in pulmonary CT analysis. Under a unified training protocol, the authors conduct controlled experiments comparing convolutional neural networks (CNNs) and Vision Transformers (ViTs) across 2D, 2.5D, and 3D input representations using the NLST (n=1,977) and LIDC-IDRI datasets, assessing performance, stability, and resource consumption. The work introduces the first joint dimension–architecture evaluation framework tailored for lung cancer screening, revealing that 3D CNNs suffer from threshold instability and ViTs are prone to degenerate predictions such as all-positive outputs. Results demonstrate that 2.5D CNNs achieve the optimal trade-off between discriminative capability and stability (ROC-AUC 0.682), highlighting the practical advantages of lower-dimensional models in real-world clinical deployment.
📝 Abstract
Three-dimensional models are widely assumed preferable for volumetric medical imaging, yet their practical value depends on whether performance gains justify added computational cost and complexity. Rather than proposing a new architecture, we study how input dimensionality (2D, 2.5D, 3D) affects model behavior across convolutional neural networks (CNNs) and Vision Transformers (ViTs) under a fixed training protocol. Using a leakage-free NLST cohort (n = 1,977) with supporting LIDC-IDRI data, we find that the 2.5D CNN offers the most favorable discrimination-stability trade-off in our comparison (ROC-AUC 0.682, 95% CI [0.546, 0.799]) with a stable operating point. In contrast, 3D CNNs show threshold instability, and transformers exhibit degenerate predictions, such as all-positive predictions. Confidence intervals are wide and overlapping, so we present these results as a controlled resource-performance frontier and a failure-mode taxonomy rather than as definitive superiority claims. For class-imbalanced lung cancer screening classification, 2D and 2.5D inputs provide a more reliable trade-off between performance, stability, and computational efficiency than full 3D representations.
Problem

Research questions and friction points this paper is trying to address.

3D medical imaging
lung CT
input dimensionality
class imbalance
model stability
Innovation

Methods, ideas, or system contributions that make the work stand out.

resource-performance frontier
input dimensionality
failure-mode taxonomy
lung cancer screening
2.5D representation
M
Md Enamul Hoq
Department of Biomedical Informatics, University of Arkansas for Medical Sciences
S
Sharafat Hossain
Department of Information Science, University of Arkansas at Little Rock
I
Imraul Emmaka
Department of Information Science, University of Arkansas at Little Rock
L
Linda Larson-Prior
Department of Neuroscience, University of Arkansas for Medical Sciences
L
Lawrence Tarbox
Department of Biomedical Informatics, University of Arkansas for Medical Sciences
J
Jonathan Bona
Department of Biomedical Informatics, University of Arkansas for Medical Sciences
D
Donald Johann Jr.
Department of Biomedical Informatics, University of Arkansas for Medical Sciences
Fred Prior
Fred Prior
Distinguished Professor and Chair, Department of Biomedical Informatics, University of Arkansas for
quantitative imaginginformatics