nnActive: A Framework for Evaluation of Active Learning in 3D Biomedical Segmentation

📅 2025-11-24
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Prior active learning (AL) evaluation in 3D biomedical image segmentation suffers from insufficient benchmarking, unreasonable baselines, and distorted cost metrics—especially under extreme foreground-background class imbalance. Method: We introduce the first systematic, open-source AL evaluation framework for 3D biomedical segmentation. It features a foreground-aware random sampling baseline and a foreground efficiency metric to rectify annotation cost miscalculation; extends nnU-Net to support partially annotated volumes and 3D patch-wise querying; and integrates uncertainty measures including predictive entropy. Results: Extensive validation across multiple datasets, annotation budgets, and labeling protocols shows that all AL methods significantly outperform conventional random sampling—but consistently fail to surpass our foreground-aware baseline. Predictive entropy achieves the best performance at high computational cost. Crucially, our work demonstrates that AL gains are fundamentally constrained by baseline validity—not algorithmic sophistication—thereby establishing a more rigorous and trustworthy evaluation paradigm for 3D medical AL.

Technology Category

Application Category

📝 Abstract
Semantic segmentation is crucial for various biomedical applications, yet its reliance on large annotated datasets presents a bottleneck due to the high cost and specialized expertise required for manual labeling. Active Learning (AL) aims to mitigate this challenge by querying only the most informative samples, thereby reducing annotation effort. However, in the domain of 3D biomedical imaging, there is no consensus on whether AL consistently outperforms Random sampling. Four evaluation pitfalls hinder the current methodological assessment. These are (1) restriction to too few datasets and annotation budgets, (2) using 2D models on 3D images without partial annotations, (3) Random baseline not being adapted to the task, and (4) measuring annotation cost only in voxels. In this work, we introduce nnActive, an open-source AL framework that overcomes these pitfalls by (1) means of a large scale study spanning four biomedical imaging datasets and three label regimes, (2) extending nnU-Net by using partial annotations for training with 3D patch-based query selection, (3) proposing Foreground Aware Random sampling strategies tackling the foreground-background class imbalance of medical images and (4) propose the foreground efficiency metric, which captures the low annotation cost of background-regions. We reveal the following findings: (A) while all AL methods outperform standard Random sampling, none reliably surpasses an improved Foreground Aware Random sampling; (B) benefits of AL depend on task specific parameters; (C) Predictive Entropy is overall the best performing AL method, but likely requires the most annotation effort; (D) AL performance can be improved with more compute intensive design choices. As a holistic, open-source framework, nnActive can serve as a catalyst for research and application of AL in 3D biomedical imaging. Code is at: https://github.com/MIC-DKFZ/nnActive
Problem

Research questions and friction points this paper is trying to address.

Evaluating Active Learning effectiveness in 3D biomedical segmentation
Addressing four methodological pitfalls in current AL assessments
Developing nnActive framework to improve AL evaluation standards
Innovation

Methods, ideas, or system contributions that make the work stand out.

Extends nnU-Net with partial annotations for 3D patch querying
Proposes Foreground Aware Random sampling for class imbalance
Introduces foreground efficiency metric to measure annotation cost
🔎 Similar Papers
No similar papers found.
Carsten T. Lüth
Carsten T. Lüth
PhD Student @ Interactive Machine Learning Research Group
Label Efficient Training of Deep Learning Models
J
Jeremias Traub
German Cancer Research Center (DKFZ) Heidelberg, Division of Medical Image Computing, Germany
K
Kim-Celine Kahl
German Cancer Research Center (DKFZ) Heidelberg, Division of Medical Image Computing, Germany
T
Till J. Bungert
German Cancer Research Center (DKFZ) Heidelberg, Division of Medical Image Computing, Germany
Lukas Klein
Lukas Klein
EPFL, USZ
Machine LearningBiotechComputer Vision
L
Lars Kraemer
German Cancer Research Center (DKFZ) Heidelberg, Division of Medical Image Computing, Germany
Paul F. Jaeger
Paul F. Jaeger
Research Scientist at Google DeepMind
Fabian Isensee
Fabian Isensee
HIP Applied Computer Vision Lab, Division of Medical Image Computing, German Cancer Research Center
Computer VisionDeep LearningSegmentationMedical Image Computing
K
Klaus Maier-Hein
German Cancer Research Center (DKFZ) Heidelberg, Division of Medical Image Computing, Germany