π€ AI Summary
This work addresses the challenge of adapting the Forward-Forward (FF) algorithm to reinforcement learning (RL), where gradient-based credit assignment is typically indispensable. We propose ARQ, the first backpropagation-free, local RL method grounded in FF principles. ARQ introduces an action-conditioned root-mean-square Q-function as a layer-wise βgoodnessβ metric, constructs local learning signals from per-layer activation statistics, and integrates temporal-difference updates for end-to-end policy optimization. Its core contributions are threefold: (i) the first adaptation of the FF forward-forward paradigm to RL; (ii) a novel action-aware local value estimation mechanism; and (iii) complete elimination of gradient backpropagation. Empirical evaluation on MinAtar and the DeepMind Control Suite demonstrates that ARQ significantly outperforms existing backprop-free RL methods and surpasses standard backpropagation baselines on most tasks.
π Abstract
The Forward-Forward (FF) Algorithm is a recently proposed learning procedure for neural networks that employs two forward passes instead of the traditional forward and backward passes used in backpropagation. However, FF remains largely confined to supervised settings, leaving a gap at domains where learning signals can be yielded more naturally such as RL. In this work, inspired by FF's goodness function using layer activity statistics, we introduce Action-conditioned Root mean squared Q-Functions (ARQ), a novel value estimation method that applies a goodness function and action conditioning for local RL using temporal difference learning. Despite its simplicity and biological grounding, our approach achieves superior performance compared to state-of-the-art local backprop-free RL methods in the MinAtar and the DeepMind Control Suite benchmarks, while also outperforming algorithms trained with backpropagation on most tasks. Code can be found at https://github.com/agentic-learning-ai-lab/arq.