Local Pairwise Distance Matching for Backpropagation-Free Reinforcement Learning

📅 2025-07-15

📈 Citations: 0

✨ Influential: 0

career value

235K/year

🤖 AI Summary

Traditional reinforcement learning (RL) relies on backpropagation (BP) for neural network training, suffering from gradient vanishing/exploding, high memory overhead, and training instability. To address these issues, this paper proposes a BP-free, layer-wise forward training framework: each layer independently optimizes a local loss derived solely from forward-pass signals—specifically, pairwise distances—and incorporates a reward-guided steering strategy alongside a multidimensional scaling-based distance-matching mechanism to decouple inter-layer updates. The method integrates local policy gradients with pairwise representation constraints, eliminating the need to store intermediate activations or propagate gradients across layers. Evaluated on standard RL benchmarks, it matches BP-based baselines in performance while significantly improving training stability and cross-task generalization. This work establishes a new paradigm for scalable, memory-efficient, and robust neural RL.

Technology Category

Application Category

📝 Abstract

Training neural networks with reinforcement learning (RL) typically relies on backpropagation (BP), necessitating storage of activations from the forward pass for subsequent backward updates. Furthermore, backpropagating error signals through multiple layers often leads to vanishing or exploding gradients, which can degrade learning performance and stability. We propose a novel approach that trains each layer of the neural network using local signals during the forward pass in RL settings. Our approach introduces local, layer-wise losses leveraging the principle of matching pairwise distances from multi-dimensional scaling, enhanced with optional reward-driven guidance. This method allows each hidden layer to be trained using local signals computed during forward propagation, thus eliminating the need for backward passes and storing intermediate activations. Our experiments, conducted with policy gradient methods across common RL benchmarks, demonstrate that this backpropagation-free method achieves competitive performance compared to their classical BP-based counterpart. Additionally, the proposed method enhances stability and consistency within and across runs, and improves performance especially in challenging environments.

Problem

Research questions and friction points this paper is trying to address.

Eliminates need for backpropagation in RL training

Addresses vanishing exploding gradient issues in deep networks

Uses local layer-wise losses for forward-pass training

Innovation

Methods, ideas, or system contributions that make the work stand out.

Local pairwise distance matching for layer-wise training

Forward pass training without backpropagation

Optional reward-driven guidance enhances learning

🔎 Similar Papers

Strongly-Polynomial Time and Validation Analysis of Policy Gradient Methods