🤖 AI Summary
This work investigates the geometric mechanisms underlying soft-label regularization—specifically label smoothing, Mixup, and CutMix—in improving model calibration and adversarial robustness for image classification. From a representation-space geometry perspective, we conduct deep feature analysis, class-center modeling, and cosine similarity quantification. We find that all three regularizers preserve the conical topology of decision regions while consistently compressing feature magnitudes and increasing cosine similarity between features and class centers. Crucially, we identify for the first time the geometric invariance of this conical structure under soft-label regularization. Through causal analysis, we establish that magnitude compression predominantly reduces calibration error, whereas enhanced cosine similarity drives improved adversarial robustness. These findings yield an interpretable geometric framework for understanding how soft-label regularization operates, bridging representation geometry with model reliability properties.
📝 Abstract
Recent studies have shown that regularization techniques using soft labels, e.g., label smoothing, Mixup, and CutMix, not only enhance image classification accuracy but also improve model calibration and robustness against adversarial attacks. However, the underlying mechanisms of such improvements remain underexplored. In this paper, we offer a novel explanation from the perspective of the representation space (i.e., the space of the features obtained at the penultimate layer). Our investigation first reveals that the decision regions in the representation space form cone-like shapes around the origin after training regardless of the presence of regularization. However, applying regularization causes changes in the distribution of features (or representation vectors). The magnitudes of the representation vectors are reduced and subsequently the cosine similarities between the representation vectors and the class centers (minimal loss points for each class) become higher, which acts as a central mechanism inducing improved calibration and robustness. Our findings provide new insights into the characteristics of the high-dimensional representation space in relation to training and regularization using soft labels.