🤖 AI Summary
Existing momentum-based optimizers suffer from an inherent limitation in dynamically balancing bias and variance in gradient estimation. To address this, we propose Stochastic Gradient Deconvolutional Filtering (SGDF), the first first-order optimizer grounded in Wiener filtering theory—introducing a theoretically guaranteed optimal gradient estimation framework. SGDF models stochastic gradient dynamics and performs time-varying optimal gain calibration to adaptively suppress noise while preserving signal components during parameter updates. The method naturally extends to adaptive optimization paradigms. Extensive experiments demonstrate that SGDF achieves faster convergence and superior generalization across multiple benchmark tasks, significantly outperforming classical momentum methods and matching state-of-the-art adaptive optimizers in performance.
📝 Abstract
In deep learning, stochastic gradient descent (SGD) and its momentum-based variants are widely used for optimization. However, the internal dynamics of these methods remain underexplored. In this paper, we analyze gradient behavior through a signal processing lens, isolating key factors that influence gradient updates and revealing a critical limitation: momentum techniques lack the flexibility to adequately balance bias and variance components in gradients, resulting in gradient estimation inaccuracies. To address this issue, we introduce a novel method SGDF (SGD with Filter) based on Wiener Filter principles, which derives an optimal time-varying gain to refine gradient updates by minimizing the mean square error in gradient estimation. This method yields an optimal first-order gradient estimate, effectively balancing noise reduction and signal preservation. Furthermore, our approach could extend to adaptive optimizers, enhancing their generalization potential. Empirical results show that SGDF achieves superior convergence and generalization compared to traditional momentum methods, and performs competitively with state-of-the-art optimizers.