🤖 AI Summary
This work investigates the limiting behavior of the prior distribution in deep Gaussian processes (Deep GPs) as the number of layers tends to infinity, with a focus on whether it degenerates to a constant function. By analyzing the asymptotics of multi-layer vector-valued Gaussian process compositions under varying bandwidth parameters, the study establishes—for the first time—a sharp threshold $r_c(d) = \Theta(\sqrt{d})$: below this threshold, the limiting distribution remains non-degenerate and non-Gaussian, with persistent non-vanishing dependencies across output coordinates. Leveraging tools from Gaussian process theory, weak convergence of probability measures, and high-dimensional asymptotic analysis, the authors rigorously prove the existence of a non-degenerate limit distribution $\pi_{\bar{Z}}$. Numerical experiments further corroborate the predicted threshold phenomenon and reveal the intricate multimodal structure of the limiting distribution.
📝 Abstract
Compositional priors describe the generic properties of layered functions in deep Bayesian models, where deep neural networks with random weights are a canonical example.In the wide-network limit, the prior is a Gaussian process with a depth-dependent kernel, and its behaviour as depth grows has been extensively studied through this kernel. Here, we study another case, where each layer itself is a vector valued Gaussian process, and our aim is similarly to understand the limiting behaviour of the prior as depth grows.
Previous GP work has established that for the RBF kernel and a certain range of bandwidths $r$, the prior degenerates in the limit, converging to the set of constant functions -- which is not useful as a probabilistic model. In this paper we establish several new results. First, we identify a sharp bandwidth threshold $r_c(d) = Θ(\sqrt{d})$ above which the limit is degenerate, strengthening the earlier bounds. Second, and more importantly, we show that for $r$ below the threshold $r_c(d)$ the prior converges to a limit distribution $π_{\bar{Z}}$. We also prove that these distributions are non-degenerate and non-Gaussian, with non-vanishing dependence between coordinates. In contrast to the previously known degenerate regime, deep Gaussian process priors can therefore admit non-trivial limits.
Empirically, we verify the threshold across a range of dimensions $d$, and demonstrate a complex multimodal behaviour of the limit distributions $π_{\bar{Z}}$ -- a regime that becomes increasingly narrow with $d$ and would be hard to identify without knowing the threshold.