Predictive Coding with Bayesian Priors via Proximal Gradients

📅 2026-06-06

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

This work unifies the understanding of predictive coding neural circuits and their hierarchical architecture from an optimization perspective, revealing their intrinsic connection to Bayesian inference. By reformulating predictive coding as a continuous-time proximal gradient descent on a regularized maximum a posteriori (MAP) objective, the authors show that a single-layer network corresponds to a leaky firing-rate model, while multi-layer structures emerge via variable splitting to form hierarchical solvers. This study is the first to rigorously formalize predictive coding as proximal gradient optimization, demonstrating that membrane leakage, recurrent connectivity, synaptic drive, and nonlinear activation all arise from a unified optimization principle. Furthermore, it recasts hierarchical generative models as undirected Markov random fields. The analysis shows that Rao–Ballard networks naturally result from this framework: priors dictate activation functions, and likelihood precision modulates sensory gain, offering a probabilistic graphical model interpretation of deep predictive coding.

📝 Abstract

We recast predictive coding as continuous-time proximal gradient descent applied to a regularized maximum-a-posteriori (MAP) objective. We study first a single-level problem and then a multi-level hierarchy. For the single-level problem, we show that proximal gradient descent is precisely a leaky firing-rate network: the membrane leak, the effective recurrent matrix, the local synaptic drive, and the static nonlinearity all follow from one optimization principle, and the resulting circuit is the one proposed by Rao and Ballard. The prior selects the nonlinearity through its proximal operator, and the likelihood precision sets the gain on the observation. For the hierarchy, we show that a classical variable-splitting relaxation of the deep MAP problem yields hierarchical predictive coding as the interconnection of local and distributed solvers. In probabilistic modeling terms, this relaxation replaces the directed generative chain by an undirected Markov random field whose node potentials are the level-wise priors. Each level then applies its own activation function, namely the proximal operator of its prior.

Problem

Research questions and friction points this paper is trying to address.

predictive coding

Bayesian priors

maximum-a-posteriori

hierarchical inference

proximal gradients

Innovation

Methods, ideas, or system contributions that make the work stand out.

Predictive Coding

Proximal Gradient Descent

Bayesian Priors