Local Reinforcement Learning with Action-Conditioned Root Mean Squared Q-Functions

📅 2025-10-08

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

This work addresses the challenge of adapting the Forward-Forward (FF) algorithm to reinforcement learning (RL), where gradient-based credit assignment is typically indispensable. We propose ARQ, the first backpropagation-free, local RL method grounded in FF principles. ARQ introduces an action-conditioned root-mean-square Q-function as a layer-wise “goodness” metric, constructs local learning signals from per-layer activation statistics, and integrates temporal-difference updates for end-to-end policy optimization. Its core contributions are threefold: (i) the first adaptation of the FF forward-forward paradigm to RL; (ii) a novel action-aware local value estimation mechanism; and (iii) complete elimination of gradient backpropagation. Empirical evaluation on MinAtar and the DeepMind Control Suite demonstrates that ARQ significantly outperforms existing backprop-free RL methods and surpasses standard backpropagation baselines on most tasks.

Technology Category

Application Category

📝 Abstract

The Forward-Forward (FF) Algorithm is a recently proposed learning procedure for neural networks that employs two forward passes instead of the traditional forward and backward passes used in backpropagation. However, FF remains largely confined to supervised settings, leaving a gap at domains where learning signals can be yielded more naturally such as RL. In this work, inspired by FF's goodness function using layer activity statistics, we introduce Action-conditioned Root mean squared Q-Functions (ARQ), a novel value estimation method that applies a goodness function and action conditioning for local RL using temporal difference learning. Despite its simplicity and biological grounding, our approach achieves superior performance compared to state-of-the-art local backprop-free RL methods in the MinAtar and the DeepMind Control Suite benchmarks, while also outperforming algorithms trained with backpropagation on most tasks. Code can be found at https://github.com/agentic-learning-ai-lab/arq.

Problem

Research questions and friction points this paper is trying to address.

Extends Forward-Forward algorithm beyond supervised learning to reinforcement learning domains

Introduces action-conditioned value estimation using goodness functions for local learning

Enables biologically-plausible RL without backpropagation while matching backprop performance

Innovation

Methods, ideas, or system contributions that make the work stand out.

Action-conditioned Root mean squared Q-Functions

Applies goodness function for local reinforcement learning

Uses temporal difference learning without backpropagation

🔎 Similar Papers

Iterated $Q$-Network: Beyond One-Step Bellman Updates in Deep Reinforcement Learning

2024-03-04Citations: 0

Authors to Follow