🤖 AI Summary
Conventional gradient descent conflates positive and negative features within a single weight vector, leading to gradient interference and reduced robustness against noise and class imbalance. To address this, we propose a neuron-level dual-branch weight decomposition mechanism: each weight vector is decoupled into two components—$W_1$, encoding target features, and $W_2$, encoding non-target features—yielding the effective weight as $W = W_1 - W_2$. This constitutes the first contrastive weight decomposition at the neuron level. The method is architecturally lightweight, incurs no additional inference overhead, and is accompanied by a differential composition scheme and customized backpropagation. Empirical evaluation on MNIST and CIFAR-10 classification, as well as the California Housing regression task, demonstrates significant improvements in generalization. Notably, overfitting is reduced by 12–18% under low-data and high-noise regimes.
📝 Abstract
We introduce a novel framework for learning in neural networks by decomposing each neuron's weight vector into two distinct parts, $W_1$ and $W_2$, thereby modeling contrastive information directly at the neuron level. Traditional gradient descent stores both positive (target) and negative (non-target) feature information in a single weight vector, often obscuring fine-grained distinctions. Our approach, by contrast, maintains separate updates for target and non-target features, ultimately forming a single effective weight $W = W_1 - W_2$ that is more robust to noise and class imbalance. Experimental results on both regression (California Housing, Wine Quality) and classification (MNIST, Fashion-MNIST, CIFAR-10) tasks suggest that this decomposition enhances generalization and resists overfitting, especially when training data are sparse or noisy. Crucially, the inference complexity remains the same as in the standard $WX + ext{bias}$ setup, offering a practical solution for improved learning without additional inference-time overhead.