Synergy Between the Strong and the Weak: Spiking Neural Networks are Inherently Self-Distillers

📅 2025-10-09

📈 Citations: 0

✨ Influential: 0

career value

190K/year

🤖 AI Summary

Spiking Neural Networks (SNNs) typically underperform compared to Artificial Neural Networks (ANNs), often requiring cumbersome knowledge distillation from large teacher models or incurring substantial additional training overhead. Method: This paper proposes a teacher-free temporal self-distillation framework that fully exploits the intrinsic temporal dynamics of SNNs. It models spike responses at different timesteps as strong and weak submodels, and introduces a bidirectional distillation paradigm—Strong2Weak and Weak2Strong—to uncover implicit temporal knowledge. Furthermore, it incorporates a confidence-aware ensemble knowledge transfer mechanism combining parallel and cascaded pathways. Contribution/Results: Experiments demonstrate significant improvements in classification accuracy and discriminative capability. The method maintains high training efficiency while enhancing adversarial robustness, establishing a novel paradigm for low-power, high-performance SNN training without external supervision.

Technology Category

Application Category

📝 Abstract

Brain-inspired spiking neural networks (SNNs) promise to be a low-power alternative to computationally intensive artificial neural networks (ANNs), although performance gaps persist. Recent studies have improved the performance of SNNs through knowledge distillation, but rely on large teacher models or introduce additional training overhead. In this paper, we show that SNNs can be naturally deconstructed into multiple submodels for efficient self-distillation. We treat each timestep instance of the SNN as a submodel and evaluate its output confidence, thus efficiently identifying the strong and the weak. Based on this strong and weak relationship, we propose two efficient self-distillation schemes: (1) extbf{Strong2Weak}: During training, the stronger "teacher" guides the weaker "student", effectively improving overall performance. (2) extbf{Weak2Strong}: The weak serve as the "teacher", distilling the strong in reverse with underlying dark knowledge, again yielding significant performance gains. For both distillation schemes, we offer flexible implementations such as ensemble, simultaneous, and cascade distillation. Experiments show that our method effectively improves the discriminability and overall performance of the SNN, while its adversarial robustness is also enhanced, benefiting from the stability brought by self-distillation. This ingeniously exploits the temporal properties of SNNs and provides insight into how to efficiently train high-performance SNNs.

Problem

Research questions and friction points this paper is trying to address.

Improving SNN performance through inherent self-distillation without external teachers

Leveraging temporal dynamics to identify strong and weak submodels for distillation

Enhancing SNN discriminability and robustness via flexible self-distillation schemes

Innovation

Methods, ideas, or system contributions that make the work stand out.

SNNs naturally deconstruct into multiple submodels

Strong2Weak distillation improves overall performance

Weak2Strong distillation enhances adversarial robustness

🔎 Similar Papers

Enhancing learning in spiking neural networks through neuronal heterogeneity and neuromodulatory signaling