🤖 AI Summary
Large performance gaps persist between spiking neural networks (SNNs) and artificial neural networks (ANNs), exacerbated by existing knowledge distillation (KD) methods that neglect the intrinsic spatiotemporal dynamics of SNNs. To address this, we propose a temporally decoupled logit-level KD framework. Our method explicitly decomposes the distillation process across timesteps at the logit level and introduces class-probability entropy regularization to stabilize optimization and enhance temporal representation robustness. Unlike conventional approaches that aggregate outputs over time, our framework achieves fine-grained timestep-wise logit alignment—uniquely unlocking the full temporal expressive capacity of SNNs. Evaluated on multiple benchmark datasets, the proposed method consistently outperforms state-of-the-art logit-, feature-, and hybrid-based KD approaches, significantly improving classification accuracy while preserving the SNN’s inherent energy efficiency.
📝 Abstract
Spiking Neural Networks (SNNs), inspired by the human brain, offer significant computational efficiency through discrete spike-based information transfer. Despite their potential to reduce inference energy consumption, a performance gap persists between SNNs and Artificial Neural Networks (ANNs), primarily due to current training methods and inherent model limitations. While recent research has aimed to enhance SNN learning by employing knowledge distillation (KD) from ANN teacher networks, traditional distillation techniques often overlook the distinctive spatiotemporal properties of SNNs, thus failing to fully leverage their advantages. To overcome these challenge, we propose a novel logit distillation method characterized by temporal separation and entropy regularization. This approach improves existing SNN distillation techniques by performing distillation learning on logits across different time steps, rather than merely on aggregated output features. Furthermore, the integration of entropy regularization stabilizes model optimization and further boosts the performance. Extensive experimental results indicate that our method surpasses prior SNN distillation strategies, whether based on logit distillation, feature distillation, or a combination of both. The code will be available on GitHub.