🤖 AI Summary
In medical image diagnosis, one-hot labels obscure inter-expert annotation discrepancies and intrinsic image ambiguity, leading to overconfident and poorly calibrated models with limited robustness. To address this, we propose Uncertainty-aware Pseudo-Labeling (UPL), a method that dynamically models sample difficulty by leveraging prediction trajectories during neural network training, thereby generating calibrated pseudo-labels that explicitly encode diagnostic uncertainty and inter-rater disagreement. UPL injects uncertainty directly into the supervision signal without modifying the network architecture, supports multi-view inputs, and enables end-to-end label enhancement. Evaluated on echocardiogram classification, UPL significantly improves model calibration, selective classification performance, and robustness to multi-view fusion—outperforming state-of-the-art uncertainty modeling and label smoothing baselines.
📝 Abstract
Computer-aided diagnosis systems must make critical decisions from medical images that are often noisy, ambiguous, or conflicting, yet today's models are trained on overly simplistic labels that ignore diagnostic uncertainty. One-hot labels erase inter-rater variability and force models to make overconfident predictions, especially when faced with incomplete or artifact-laden inputs. We address this gap by introducing a novel framework that brings uncertainty back into the label space. Our method leverages neural network training dynamics (NNTD) to assess the inherent difficulty of each training sample. By aggregating and calibrating model predictions during training, we generate uncertainty-aware pseudo-labels that reflect the ambiguity encountered during learning. This label augmentation approach is architecture-agnostic and can be applied to any supervised learning pipeline to enhance uncertainty estimation and robustness. We validate our approach on a challenging echocardiography classification benchmark, demonstrating superior performance over specialized baselines in calibration, selective classification, and multi-view fusion.