Signal-Adaptive Trust Regions for Gradient-Free Optimization of Recurrent Spiking Neural Networks

📅 2026-01-29
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the instability in gradient-free optimization of recurrent spiking neural networks (RSNNs) for high-dimensional, long-horizon reinforcement learning, which often stems from high estimator variance. To mitigate this, the authors propose the Signal-Adaptive Trust Region (SATR) method, which introduces, for the first time, a KL-divergence constraint normalized by signal energy to guide RSNN policy updates. The trust region dynamically expands under strong signal conditions and contracts when noise dominates, thereby enhancing optimization stability—particularly with small population sizes. By integrating Bernoulli connection distribution modeling and bitset acceleration techniques, SATR achieves significantly better performance than existing gradient-free approaches across multiple high-dimensional continuous control tasks, matching the training stability of PPO-LSTM while substantially reducing wall-clock training time.

Technology Category

Application Category

📝 Abstract
Recurrent spiking neural networks (RSNNs) are a promising substrate for energy-efficient control policies, but training them for high-dimensional, long-horizon reinforcement learning remains challenging. Population-based, gradient-free optimization circumvents backpropagation through non-differentiable spike dynamics by estimating gradients. However, with finite populations, high variance of these estimates can induce harmful and overly aggressive update steps. Inspired by trust-region methods in reinforcement learning that constrain policy updates in distribution space, we propose \textbf{Signal-Adaptive Trust Regions (SATR)}, a distributional update rule that constrains relative change by bounding KL divergence normalized by an estimated signal energy. SATR automatically expands the trust region under strong signals and contracts it when updates are noise-dominated. We instantiate SATR for Bernoulli connectivity distributions, which have shown strong empirical performance for RSNN optimization. Across a suite of high-dimensional continuous-control benchmarks, SATR improves stability under limited populations and reaches competitive returns against strong baselines including PPO-LSTM. In addition, to make SATR practical at scale, we introduce a bitset implementation for binary spiking and binary weights, substantially reducing wall-clock training time and enabling fast RSNN policy search.
Problem

Research questions and friction points this paper is trying to address.

Recurrent Spiking Neural Networks
Gradient-Free Optimization
High-Dimensional Reinforcement Learning
Policy Update Stability
Population-Based Training
Innovation

Methods, ideas, or system contributions that make the work stand out.

Signal-Adaptive Trust Regions
Gradient-Free Optimization
Recurrent Spiking Neural Networks
KL Divergence Constraint
Binary Weight Implementation
🔎 Similar Papers
No similar papers found.
J
Jinhao Li
Sapient Intelligence
Y
Yuhao Sun
Sapient Intelligence
Z
Zhiyuan Ma
Tsinghua University
H
Hao He
Tsinghua University
X
Xinche Zhang
Tsinghua University
X
Xing Chen
Sapient Intelligence
J
Jin Li
Sapient Intelligence
Sen Song
Sen Song
Laboratory of Brain and Intelligence, Dept of Biomedical Engineering, Tsinghua University
Brain-inspired ComputationComputational NeurocienceArtificial General IntelligenceScience of HappinessNeural Circuits