An Exponential Averaging Process with Strong Convergence Properties

📅 2025-05-15

📈 Citations: 0

✨ Influential: 0

career value

256K/year

🤖 AI Summary

Traditional exponential moving average (EMA) fails to suppress asymptotic noise in stochastic dynamical system trajectories due to its fixed decay rate, resulting in averaged estimates lacking strong stochastic convergence. To address this, we propose the *p-EMA* method, which introduces a subharmonic decaying weight sequence—assigning gradually diminishing yet persistently non-negligible weight to recent observations. Under mild weak autocorrelation assumptions, p-EMA establishes the first exponentially weighted averaging framework with rigorous strong convergence guarantees. We prove that p-EMA achieves almost sure convergence and $L^2$ convergence, with asymptotic variance vanishing to zero—overcoming fundamental limitations of standard EMA. Furthermore, integrating p-EMA into SGD gradient estimation enhances optimization stability. Empirical results demonstrate over 40% faster error convergence under non-stationary noise compared to conventional EMA-based estimators.

Technology Category

Application Category

📝 Abstract

Averaging, or smoothing, is a fundamental approach to obtain stable, de-noised estimates from noisy observations. In certain scenarios, observations made along trajectories of random dynamical systems are of particular interest. One popular smoothing technique for such a scenario is exponential moving averaging (EMA), which assigns observations a weight that decreases exponentially in their age, thus giving younger observations a larger weight. However, EMA fails to enjoy strong stochastic convergence properties, which stems from the fact that the weight assigned to the youngest observation is constant over time, preventing the noise in the averaged quantity from decreasing to zero. In this work, we consider an adaptation to EMA, which we call $p$-EMA, where the weights assigned to the last observations decrease to zero at a subharmonic rate. We provide stochastic convergence guarantees for this kind of averaging under mild assumptions on the autocorrelations of the underlying random dynamical system. We further discuss the implications of our results for a recently introduced adaptive step size control for Stochastic Gradient Descent (SGD), which uses $p$-EMA for averaging noisy observations.

Problem

Research questions and friction points this paper is trying to address.

Improving exponential moving averaging for noisy observations

Ensuring strong stochastic convergence in dynamic systems

Enhancing adaptive step size control in SGD

Innovation

Methods, ideas, or system contributions that make the work stand out.

Adapts EMA with subharmonic weight decay

Ensures strong stochastic convergence guarantees

Applied in adaptive SGD step size control

🔎 Similar Papers

Rates of Convergence in the Central Limit Theorem for Markov Chains, with an Application to TD Learning