Decision SpikeFormer: Spike-Driven Transformer for Decision Making

📅 2025-04-04

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

This work addresses the high energy consumption bottleneck of offline reinforcement learning (RL) in energy-constrained embedded AI scenarios. We propose DSFormer, the first spiking-driven Transformer model for offline RL. Methodologically, we design Temporal Spiking Self-Attention (TSSA) and Positional Spiking Self-Attention (PSSA) to capture spatiotemporal dependencies in decision sequences, and introduce Progressive Threshold-Dependent Batch Normalization (PTBN) to accommodate spike sparsity and temporal dynamics. Our key contributions are: (i) the first integration of spiking neural networks (SNNs) with Transformers for offline RL policy learning—fully eliminating environment interaction; and (ii) competitive performance on the D4RL benchmark relative to state-of-the-art artificial neural network (ANN) and SNN baselines, while achieving a 78.4% reduction in energy consumption. DSFormer establishes a novel paradigm for ultra-low-power edge intelligence.

Technology Category

Application Category

📝 Abstract

Offline reinforcement learning (RL) enables policy training solely on pre-collected data, avoiding direct environment interaction - a crucial benefit for energy-constrained embodied AI applications. Although Artificial Neural Networks (ANN)-based methods perform well in offline RL, their high computational and energy demands motivate exploration of more efficient alternatives. Spiking Neural Networks (SNNs) show promise for such tasks, given their low power consumption. In this work, we introduce DSFormer, the first spike-driven transformer model designed to tackle offline RL via sequence modeling. Unlike existing SNN transformers focused on spatial dimensions for vision tasks, we develop Temporal Spiking Self-Attention (TSSA) and Positional Spiking Self-Attention (PSSA) in DSFormer to capture the temporal and positional dependencies essential for sequence modeling in RL. Additionally, we propose Progressive Threshold-dependent Batch Normalization (PTBN), which combines the benefits of LayerNorm and BatchNorm to preserve temporal dependencies while maintaining the spiking nature of SNNs. Comprehensive results in the D4RL benchmark show DSFormer's superiority over both SNN and ANN counterparts, achieving 78.4% energy savings, highlighting DSFormer's advantages not only in energy efficiency but also in competitive performance. Code and models are public at https://wei-nijuan.github.io/DecisionSpikeFormer.

Problem

Research questions and friction points this paper is trying to address.

Develops spike-driven transformer for offline RL decision-making

Addresses high energy demands in ANN-based reinforcement learning

Captures temporal dependencies in RL with spiking neural networks

Innovation

Methods, ideas, or system contributions that make the work stand out.

Spike-driven transformer for offline RL

Temporal and positional spiking self-attention

Progressive threshold-dependent batch normalization

🔎 Similar Papers

No similar papers found.

Authors to Follow