Decision SpikeFormer: Spike-Driven Transformer for Decision Making

๐Ÿ“… 2025-04-04
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
This work addresses the high energy consumption bottleneck of offline reinforcement learning (RL) in energy-constrained embedded AI scenarios. We propose DSFormer, the first spiking-driven Transformer model for offline RL. Methodologically, we design Temporal Spiking Self-Attention (TSSA) and Positional Spiking Self-Attention (PSSA) to capture spatiotemporal dependencies in decision sequences, and introduce Progressive Threshold-Dependent Batch Normalization (PTBN) to accommodate spike sparsity and temporal dynamics. Our key contributions are: (i) the first integration of spiking neural networks (SNNs) with Transformers for offline RL policy learningโ€”fully eliminating environment interaction; and (ii) competitive performance on the D4RL benchmark relative to state-of-the-art artificial neural network (ANN) and SNN baselines, while achieving a 78.4% reduction in energy consumption. DSFormer establishes a novel paradigm for ultra-low-power edge intelligence.

Technology Category

Application Category

๐Ÿ“ Abstract
Offline reinforcement learning (RL) enables policy training solely on pre-collected data, avoiding direct environment interaction - a crucial benefit for energy-constrained embodied AI applications. Although Artificial Neural Networks (ANN)-based methods perform well in offline RL, their high computational and energy demands motivate exploration of more efficient alternatives. Spiking Neural Networks (SNNs) show promise for such tasks, given their low power consumption. In this work, we introduce DSFormer, the first spike-driven transformer model designed to tackle offline RL via sequence modeling. Unlike existing SNN transformers focused on spatial dimensions for vision tasks, we develop Temporal Spiking Self-Attention (TSSA) and Positional Spiking Self-Attention (PSSA) in DSFormer to capture the temporal and positional dependencies essential for sequence modeling in RL. Additionally, we propose Progressive Threshold-dependent Batch Normalization (PTBN), which combines the benefits of LayerNorm and BatchNorm to preserve temporal dependencies while maintaining the spiking nature of SNNs. Comprehensive results in the D4RL benchmark show DSFormer's superiority over both SNN and ANN counterparts, achieving 78.4% energy savings, highlighting DSFormer's advantages not only in energy efficiency but also in competitive performance. Code and models are public at https://wei-nijuan.github.io/DecisionSpikeFormer.
Problem

Research questions and friction points this paper is trying to address.

Develops spike-driven transformer for offline RL decision-making
Addresses high energy demands in ANN-based reinforcement learning
Captures temporal dependencies in RL with spiking neural networks
Innovation

Methods, ideas, or system contributions that make the work stand out.

Spike-driven transformer for offline RL
Temporal and positional spiking self-attention
Progressive threshold-dependent batch normalization
๐Ÿ”Ž Similar Papers
No similar papers found.
W
Wei Huang
Shanghai AI Laboratory, Wuhan University
Q
Qinying Gu
Shanghai AI Laboratory
Nanyang Ye
Nanyang Ye
Shanghai Jiao Tong University
Out-of-Distribution GeneralizationEmbodied AIUnmanned Aerial VehicleHDR Imaging