🤖 AI Summary
This work addresses a critical limitation in existing uncertainty quantification methods for image segmentation, which predominantly rely on global average aggregation and thereby neglect spatial structural information, impairing their performance in downstream tasks such as out-of-distribution detection and failure identification. The study presents the first systematic analysis and formalization of the shortcomings inherent in common uncertainty aggregation strategies. To overcome these limitations, the authors propose a novel aggregation approach that explicitly incorporates spatial structure awareness and further introduce a meta-learning-based adaptive meta-aggregator capable of dynamically fusing multiple aggregation strategies. Extensive experiments across ten diverse datasets demonstrate that the proposed method substantially enhances downstream task performance, with the meta-aggregator exhibiting consistently superior and robust generalization across datasets.
📝 Abstract
Uncertainty Quantification (UQ) is crucial for ensuring the reliability of automated image segmentations in safety-critical domains like biomedical image analysis or autonomous driving. In segmentation, UQ generates pixel-wise uncertainty scores that must be aggregated into image-level scores for downstream tasks like Out-of-Distribution (OoD) or failure detection. Despite routine use of aggregation strategies, their properties and impact on downstream task performance have not yet been comprehensively studied. Global Average is the default choice, yet it does not account for spatial and structural features of segmentation uncertainty. Alternatives like patch-, class- and threshold-based strategies exist, but lack systematic comparison, leading to inconsistent reporting and unclear best practices. We address this gap by (1) formally analyzing properties, limitations, and pitfalls of common strategies; (2) proposing novel strategies that incorporate spatial uncertainty structure and (3) benchmarking their performance on OoD and failure detection across ten datasets that vary in image geometry and structure. We find that aggregators leveraging spatial structure yield stronger performance in both downstream tasks studied. However, the performance of individual aggregators depends heavily on dataset characteristics, so we (4) propose a meta-aggregator that integrates multiple aggregators and performs robustly across datasets.