🤖 AI Summary
Existing visual prompting methods rely on dense pixel-level prompts, suffering from high redundancy, limited generalization, and poor energy efficiency. This work proposes LoRSP, a novel framework that introduces brain-inspired spiking neural networks (SNNs) into visual prompt learning for the first time. Leveraging the integrate-and-fire dynamics of spiking neurons, LoRSP generates instance-specific low-rank sparse prompts. By synergistically combining low-rank decomposition with the inherent dynamic sparsity of SNNs, the method achieves performance on par with or superior to existing approaches across five heterogeneous vision backbones and multiple benchmarks, while fine-tuning only a minimal number of parameters. This significantly enhances the compactness, robustness, and energy efficiency of model adaptation.
📝 Abstract
Visual Prompting (VP) has emerged as an efficient paradigm for adapting large-scale pre-trained vision models to downstream tasks by incorporating learnable prompts at the input level. However, existing VP methods typically employ dense pixel-level prompts, which often suffer from redundant perturbations, limited generalization and energy inefficiency. To overcome these limitations, we propose to integrate brain-inspired spiking learning into visual prompt learning tasks. As we know that spiking neuron can perform inexpensive information processing by transmitting the input data into discrete spike trains and return sparse outputs. Inspired by this, we propose \textbf{Lo}w-\textbf{R}ank visual \textbf{S}pike \textbf{P}rompting (LoRSP), a novel framework that learns dynamic low-rank sparse visual prompts naturally via a Spiking neuron learning mechanism. The core idea of LoRSP is to exploit the brain-inspired sparse firing mechanism of spiking neurons to generate pixel-level sparse prompt for each instance. To be specific, we first construct a series of prompt factors via low-rank factorization to capture distinct prompt subspaces. These prompt factors are then fed into an SNN architecture, which performs the integrate-and-fire process to emit spikes. As a result, our LoRSP generates a \emph{sparse} visual prompt while maintaining the low-rank constraint. This design enables instance-specific selective prompting, leading to more compact and robust adaptation across diverse downstream tasks. Extensive experiments on five heterogeneous vision backbones and multiple benchmarks demonstrate that LoRSP achieves competitive performance while requiring fewer tunable parameters compared to existing VP methods.