🤖 AI Summary
Machine learning models often fail under distribution shifts when features exhibit comparable discriminative power for both classes (i.e., (P(y=0 mid x) approx P(y=1 mid x))), particularly due to fixed activation mechanisms that lack adaptability to environmental perturbations. To address this, we propose Context-Adaptive Quantile Activation (QACT), which maps each neuron’s output to its relative quantile within a local sliding window over the activation distribution—enabling dynamic boundary calibration and implicit sensitivity to distributional changes, without introducing parameters or modifying network architecture. Evaluated on challenging corruption benchmarks—including CIFAR-10C/100C, MNIST-C, and TinyImageNet-C—QACT significantly outperforms standard MLP and CNN baselines. Remarkably, under severe corruptions, it even surpasses DINOv2-small, demonstrating exceptional robustness and generalization under distribution shift. This validates quantile-based activation as a simple yet highly effective mechanism for enhancing model resilience to environmental distributional variations.
📝 Abstract
An established failure mode for machine learning models occurs when the same features are equally likely to belong to class 0 and class 1. In such cases, existing ML models cannot correctly classify the sample. However, a solvable case emerges when the probabilities of class 0 and 1 vary with the context distribution. To the best of our knowledge, standard neural network architectures like MLPs or CNNs are not equipped to handle this. In this article, we propose a simple activation function, quantile activation (QACT), that addresses this problem without significantly increasing computational costs. The core idea is to adapt the outputs of each neuron to its context distribution. The proposed quantile activation, QACT, produces the relative quantile of the sample in its context distribution, rather than the actual values, as in traditional networks. A practical example where the same sample can have different labels arises in cases of inherent distribution shift. We validate the proposed activation function under such shifts, using datasets designed to test robustness against distortions : CIFAR10C, CIFAR100C, MNISTC, TinyImagenetC. Our results demonstrate significantly better generalization across distortions compared to conventional classifiers, across various architectures. Although this paper presents a proof of concept, we find that this approach unexpectedly outperforms DINOv2 (small) under large distortions, despite DINOv2 being trained with a much larger network and dataset.