Revisiting Gradient Descent: A Dual-Weight Method for Improved Learning

📅 2025-03-15
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Conventional gradient descent conflates positive and negative features within a single weight vector, leading to gradient interference and reduced robustness against noise and class imbalance. To address this, we propose a neuron-level dual-branch weight decomposition mechanism: each weight vector is decoupled into two components—$W_1$, encoding target features, and $W_2$, encoding non-target features—yielding the effective weight as $W = W_1 - W_2$. This constitutes the first contrastive weight decomposition at the neuron level. The method is architecturally lightweight, incurs no additional inference overhead, and is accompanied by a differential composition scheme and customized backpropagation. Empirical evaluation on MNIST and CIFAR-10 classification, as well as the California Housing regression task, demonstrates significant improvements in generalization. Notably, overfitting is reduced by 12–18% under low-data and high-noise regimes.

Technology Category

Application Category

📝 Abstract
We introduce a novel framework for learning in neural networks by decomposing each neuron's weight vector into two distinct parts, $W_1$ and $W_2$, thereby modeling contrastive information directly at the neuron level. Traditional gradient descent stores both positive (target) and negative (non-target) feature information in a single weight vector, often obscuring fine-grained distinctions. Our approach, by contrast, maintains separate updates for target and non-target features, ultimately forming a single effective weight $W = W_1 - W_2$ that is more robust to noise and class imbalance. Experimental results on both regression (California Housing, Wine Quality) and classification (MNIST, Fashion-MNIST, CIFAR-10) tasks suggest that this decomposition enhances generalization and resists overfitting, especially when training data are sparse or noisy. Crucially, the inference complexity remains the same as in the standard $WX + ext{bias}$ setup, offering a practical solution for improved learning without additional inference-time overhead.
Problem

Research questions and friction points this paper is trying to address.

Decomposes neuron weights to model contrastive information directly
Enhances generalization and resists overfitting in sparse/noisy data
Maintains inference complexity while improving learning robustness
Innovation

Methods, ideas, or system contributions that make the work stand out.

Decomposes neuron weights into dual components
Separates updates for target and non-target features
Maintains inference complexity without overhead
🔎 Similar Papers
No similar papers found.