🤖 AI Summary
To address the poor adaptability of Adam in online recommendation systems under dynamic data distribution shifts and noise interference, this paper proposes CM-Adam—a confidence-guided adaptive optimizer based on gradient-momentum consistency. Its core innovation is the first-ever parameter-dimension-wise consistency confidence scoring mechanism, which dynamically discriminates between distribution drift and noise to enable selective update gating. CM-Adam integrates three key components: (i) gradient direction consistency detection, (ii) adaptive momentum buffer decay, and (iii) noise-robust parameter updates. Designed for real-time streaming training, it achieves significant improvements over baselines—including Adam and AdaGrad—on synthetic benchmarks and multiple real-world recommendation datasets. Online A/B tests confirm consistent gains in click-through rate (CTR) and gross merchandise volume (GMV), demonstrating strong practicality and generalizability across diverse production environments.
📝 Abstract
Modern recommendation systems frequently employ online learning to dynamically update their models with freshly collected data. The most commonly used optimizer for updating neural networks in these contexts is the Adam optimizer, which integrates momentum ($m_t$) and adaptive learning rate ($v_t$). However, the volatile nature of online learning data, characterized by its frequent distribution shifts and presence of noises, poses significant challenges to Adam's standard optimization process: (1) Adam may use outdated momentum and the average of squared gradients, resulting in slower adaptation to distribution changes, and (2) Adam's performance is adversely affected by data noise. To mitigate these issues, we introduce CAdam, a confidence-based optimization strategy that assesses the consistence between the momentum and the gradient for each parameter dimension before deciding on updates. If momentum and gradient are in sync, CAdam proceeds with parameter updates according to Adam's original formulation; if not, it temporarily withholds updates and monitors potential shifts in data distribution in subsequent iterations. This method allows CAdam to distinguish between the true distributional shifts and mere noise, and adapt more quickly to new data distributions. Our experiments with both synthetic and real-world datasets demonstrate that CAdam surpasses other well-known optimizers, including the original Adam, in efficiency and noise robustness. Furthermore, in large-scale A/B testing within a live recommendation system, CAdam significantly enhances model performance compared to Adam, leading to substantial increases in the system's gross merchandise volume (GMV).