🤖 AI Summary
This paper investigates the monotonicity of maximum likelihood estimator (MLE) learning curves—specifically, whether the forward Kullback–Leibler (KL) divergence (i.e., sequential prediction error under log loss) is strictly nonincreasing as sample size grows. For canonical parametric models—including Gaussian vectors with unknown mean and covariance, and Gamma distributions with unknown scale—we establish, for the first time under well-specified conditions, **rigorous guarantees of complete monotonicity** for the forward KL learning curve, thereby resolving a long-standing open problem in the univariate Gaussian case. Furthermore, we uncover the underlying mechanism behind the folkloric monotonicity of learning curves under reverse KL divergence for generalized exponential families. Our approach integrates information geometry, statistical asymptotics, and structural analysis of exponential families. All proofs were autonomously derived by an AI system (a variant of GPT-5.2 Pro) and rigorously verified by human experts. This work provides the first systematic monotonicity characterization of MLE generalization behavior.
📝 Abstract
The property of learning-curve monotonicity, highlighted in a recent series of work by Loog, Mey and Viering, describes algorithms which only improve in average performance given more data, for any underlying data distribution within a given family. We establish the first nontrivial monotonicity guarantees for the maximum likelihood estimator in a variety of well-specified parametric settings. For sequential prediction with log loss, we show monotonicity (in fact complete monotonicity) of the forward KL divergence for Gaussian vectors with unknown covariance and either known or unknown mean, as well as for Gamma variables with unknown scale parameter. The Gaussian setting was explicitly highlighted as open in the aforementioned works, even in dimension 1. Finally we observe that for reverse KL divergence, a folklore trick yields monotonicity for very general exponential families.
All results in this paper were derived by variants of GPT-5.2 Pro. Humans did not provide any proof strategies or intermediate arguments, but only prompted the model to continue developing additional results, and verified and transcribed its proofs.