MedNeXt-v2: Scaling 3D ConvNeXts for Large-Scale Supervised Representation Learning in Medical Image Segmentation

📅 2025-12-19
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Current 3D medical image segmentation backbones exhibit insufficient representational capacity under large-scale data regimes. To address this, we propose MedNeXt-v2—a scalable 3D ConvNeXt architecture featuring compound scaling across depth, width, and receptive field—alongside the first 3D Global Response Normalization (GRN) module. We establish a novel evaluation principle: “downstream performance is predictable from zero-shot pretraining metrics.” Our analysis reveals previously unobserved patterns: pathological segmentation is markedly more sensitive to representation scaling than anatomical segmentation, and modality-specific pretraining yields no significant gain. MedNeXt-v2 is supervised pre-trained on 18k CT volumes and integrated into nnUNet with full-parameter fine-tuning. It achieves state-of-the-art performance across six CT/MR benchmarks encompassing 144 anatomical structures, consistently outperforming seven publicly available pretrained models. Code and pretrained models are publicly released in the official nnUNet repository.

Technology Category

Application Category

📝 Abstract
Large-scale supervised pretraining is rapidly reshaping 3D medical image segmentation. However, existing efforts focus primarily on increasing dataset size and overlook the question of whether the backbone network is an effective representation learner at scale. In this work, we address this gap by revisiting ConvNeXt-based architectures for volumetric segmentation and introducing MedNeXt-v2, a compound-scaled 3D ConvNeXt that leverages improved micro-architecture and data scaling to deliver state-of-the-art performance. First, we show that routinely used backbones in large-scale pretraining pipelines are often suboptimal. Subsequently, we use comprehensive backbone benchmarking prior to scaling and demonstrate that stronger from scratch performance reliably predicts stronger downstream performance after pretraining. Guided by these findings, we incorporate a 3D Global Response Normalization module and use depth, width, and context scaling to improve our architecture for effective representation learning. We pretrain MedNeXt-v2 on 18k CT volumes and demonstrate state-of-the-art performance when fine-tuning across six challenging CT and MR benchmarks (144 structures), showing consistent gains over seven publicly released pretrained models. Beyond improvements, our benchmarking of these models also reveals that stronger backbones yield better results on similar data, representation scaling disproportionately benefits pathological segmentation, and that modality-specific pretraining offers negligible benefit once full finetuning is applied. In conclusion, our results establish MedNeXt-v2 as a strong backbone for large-scale supervised representation learning in 3D Medical Image Segmentation. Our code and pretrained models are made available with the official nnUNet repository at: https://www.github.com/MIC-DKFZ/nnUNet
Problem

Research questions and friction points this paper is trying to address.

Scaling 3D ConvNeXt for large-scale supervised medical image segmentation
Addressing suboptimal backbone networks in large-scale pretraining pipelines
Improving representation learning with enhanced architecture and data scaling
Innovation

Methods, ideas, or system contributions that make the work stand out.

Compound-scaled 3D ConvNeXt architecture for volumetric segmentation
Incorporates 3D Global Response Normalization and depth-width-context scaling
Pretrained on 18k CT volumes for state-of-the-art segmentation performance
🔎 Similar Papers
No similar papers found.
Saikat Roy
Saikat Roy
Doctoral Researcher, German Cancer Research Center (DKFZ)
Deep LearningImage SegmentationRepresentation LearningDiffusion ModelsMedical Image Analysis
Yannick Kirchhoff
Yannick Kirchhoff
PhD Student, DKFZ
Computer VisionDeep LearningMedical Image Computing
Constantin Ulrich
Constantin Ulrich
German Cancer Research Center (DKFZ)
Medical Image ComputingMedical physicsComputer Vision
M
Maximillian Rokuss
German Cancer Research Center (DKFZ) Heidelberg, Division of Medical Image Computing, Germany; Faculty of Mathematics and Computer Science, Heidelberg University, Germany
Tassilo Wald
Tassilo Wald
PhD Student, Deutsche Krebsforschungszentrum (DKFZ)
representation learningself-supervised learningmedical image analysis
Fabian Isensee
Fabian Isensee
HIP Applied Computer Vision Lab, Division of Medical Image Computing, German Cancer Research Center
Computer VisionDeep LearningSegmentationMedical Image Computing
K
Klaus Maier-Hein
German Cancer Research Center (DKFZ) Heidelberg, Division of Medical Image Computing, Germany; Faculty of Mathematics and Computer Science, Heidelberg University, Germany; Pattern Analysis and Learning Group, Department of Radiation Oncology, Heidelberg University Hospital, Germany