🤖 AI Summary
In attribute-imbalanced data, models often resort to shortcut learning, leading to non-collapsed yet biased feature representations. Method: This paper proposes a zero-overhead debiasing framework grounded in neural collapse theory—introducing neural collapse structure for the first time to mitigate shortcut learning. We design a “shortcut-avoidance” paradigm featuring a shortcut-prime guidance mechanism, symmetry constraints on the feature space, and end-to-end differentiable optimization, intervening at training initialization without additional parameters or computational cost. Contribution/Results: Theoretical analysis and experiments demonstrate that our method significantly suppresses early biased feature collapse on both synthetic and real-world biased datasets, enhances training stability, and achieves state-of-the-art generalization performance at zero extra overhead.
📝 Abstract
Recent studies have noted an intriguing phenomenon termed Neural Collapse, that is, when the neural networks establish the right correlation between feature spaces and the training targets, their last-layer features, together with the classifier weights, will collapse into a stable and sym-metric structure. In this paper, we extend the investigation of Neural Collapse to the biased datasets with im-balanced attributes. We observe that models will easily fall into the pitfall of shortcut learning and form a biased, non-collapsed feature space at the early period of training, which is hard to reverse and limits the generalization capability. To tackle the root cause of biased classification, we follow the recent inspiration of prime training, and propose an avoid-shortcut learning framework without ad-ditional training complexity. With well-designed shortcut primes based on Neural Collapse structure, the models are encouraged to skip the pursuit of simple shortcuts and nat-urally capture the intrinsic correlations. Experimental re-sults demonstrate that our method induces better conver-gence properties during training, and achieves state-of-the-art generalization performance on both synthetic and real-world biased datasets.