🤖 AI Summary
Quantifying prediction confidence in neural network decision-making remains challenging, particularly for safety-critical applications requiring reliable uncertainty estimation. Method: This paper proposes a training-free, post-hoc confidence scoring method grounded in the geometric structure of the softmax output space. It models softmax outputs as a metric cluster space, learns class-wise prototypical distributions via clustering, and computes the distance from each prediction to its corresponding class centroid—yielding a model-agnostic, annotation-free confidence metric. An adaptive safety threshold mechanism is further introduced to detect erroneous predictions and trigger human intervention. Results: Evaluated on MNIST and CIFAR-10, the method significantly improves error detection rates, achieves high-fidelity confidence calibration, and ensures controllable latency. It demonstrates strong cross-model and cross-dataset generalization, effectively supporting human–AI collaborative decision-making in safety-critical domains.
📝 Abstract
Ensuring the reliability and safety of automated decision-making is crucial. This paper proposes a new approach for measuring the reliability of predictions in machine learning models. We analyze how the outputs of a trained neural network change using clustering to measure distances between outputs and class centroids. We propose this distance as a metric to evaluate the confidence of predictions. We assign each prediction to a cluster with centroid representing the mean softmax output for all correct predictions of a given class. We then define a safety threshold for a class as the smallest distance from an incorrect prediction to the given class centroid. We evaluate the approach on the MNIST and CIFAR-10 datasets using a Convolutional Neural Network and a Vision Transformer, respectively. The results show that our approach is consistent across these data sets and network models, and indicate that the proposed metric can offer an efficient way of determining when automated predictions are acceptable and when they should be deferred to human operators.