🤖 AI Summary
This work addresses the limitation of conventional softmax-based nonconformity scores, which inadequately capture input sample difficulty and consequently yield conformal prediction sets lacking adaptivity and efficiency. To overcome this, the authors propose leveraging Helmholtz free energy in the pre-softmax logit space as a principled measure of uncertainty and introduce a monotonic transformation to reweight nonconformity scores, rendering prediction sets more sensitive to input difficulty. This approach represents the first application of Helmholtz free energy to calibrate scoring functions in conformal prediction, enhancing adaptivity without requiring complex post-processing. Evaluated across multiple datasets and deep architectures in conjunction with four state-of-the-art scoring functions, the method consistently achieves significant improvements in both the efficiency and adaptivity of prediction sets.
📝 Abstract
The merit of Conformal Prediction (CP), as a distribution-free framework for uncertainty quantification, depends on generating prediction sets that are efficient, reflected in small average set sizes, while adaptive, meaning they signal uncertainty by varying in size according to input difficulty. A central limitation for deep conformal classifiers is that the nonconformity scores are derived from softmax outputs, which can be unreliable indicators of how certain the model truly is about a given input, sometimes leading to overconfident misclassifications or undue hesitation. In this work, we argue that this unreliability can be inherited by the prediction sets generated by CP, limiting their capacity for adaptiveness. We propose a new approach that leverages information from the pre-softmax logit space, using the Helmholtz Free Energy as a measure of model uncertainty and sample difficulty. By reweighting nonconformity scores with a monotonic transformation of the energy score of each sample, we improve their sensitivity to input difficulty. Our experiments with four state-of-the-art score functions on multiple datasets and deep architectures show that this energy-based enhancement improves the adaptiveness of the prediction sets, leading to a notable increase in both efficiency and adaptiveness compared to baseline nonconformity scores, without introducing any post-hoc complexity.