Beyond Low-Rank: Low-Rank Sparse Prompting via Spiking Neural Network and Prompt Factorization

📅 2026-06-01
📈 Citations: 0
Influential: 0
📄 PDF

career value

189K/year
🤖 AI Summary
Existing visual prompting methods rely on dense pixel-level prompts, suffering from high redundancy, limited generalization, and poor energy efficiency. This work proposes LoRSP, a novel framework that introduces brain-inspired spiking neural networks (SNNs) into visual prompt learning for the first time. Leveraging the integrate-and-fire dynamics of spiking neurons, LoRSP generates instance-specific low-rank sparse prompts. By synergistically combining low-rank decomposition with the inherent dynamic sparsity of SNNs, the method achieves performance on par with or superior to existing approaches across five heterogeneous vision backbones and multiple benchmarks, while fine-tuning only a minimal number of parameters. This significantly enhances the compactness, robustness, and energy efficiency of model adaptation.
📝 Abstract
Visual Prompting (VP) has emerged as an efficient paradigm for adapting large-scale pre-trained vision models to downstream tasks by incorporating learnable prompts at the input level. However, existing VP methods typically employ dense pixel-level prompts, which often suffer from redundant perturbations, limited generalization and energy inefficiency. To overcome these limitations, we propose to integrate brain-inspired spiking learning into visual prompt learning tasks. As we know that spiking neuron can perform inexpensive information processing by transmitting the input data into discrete spike trains and return sparse outputs. Inspired by this, we propose \textbf{Lo}w-\textbf{R}ank visual \textbf{S}pike \textbf{P}rompting (LoRSP), a novel framework that learns dynamic low-rank sparse visual prompts naturally via a Spiking neuron learning mechanism. The core idea of LoRSP is to exploit the brain-inspired sparse firing mechanism of spiking neurons to generate pixel-level sparse prompt for each instance. To be specific, we first construct a series of prompt factors via low-rank factorization to capture distinct prompt subspaces. These prompt factors are then fed into an SNN architecture, which performs the integrate-and-fire process to emit spikes. As a result, our LoRSP generates a \emph{sparse} visual prompt while maintaining the low-rank constraint. This design enables instance-specific selective prompting, leading to more compact and robust adaptation across diverse downstream tasks. Extensive experiments on five heterogeneous vision backbones and multiple benchmarks demonstrate that LoRSP achieves competitive performance while requiring fewer tunable parameters compared to existing VP methods.
Problem

Research questions and friction points this paper is trying to address.

Visual Prompting
dense prompts
redundant perturbations
limited generalization
energy inefficiency
Innovation

Methods, ideas, or system contributions that make the work stand out.

Spiking Neural Network
Visual Prompting
Low-Rank Factorization
Sparse Prompting
Instance-Specific Adaptation
Y
Yumiao Zhao
Information Materials and Intelligent Sensing Laboratory of Anhui Province, Anhui Provincial Key Laboratory of Multimodal Cognitive Computation, School of Computer Science and Technology, Anhui University
Bo Jiang
Bo Jiang
Anhui University
Computer Vision and Pattern Recognition
B
Beibei Wang
Information Materials and Intelligent Sensing Laboratory of Anhui Province, Anhui Provincial Key Laboratory of Multimodal Cognitive Computation, School of Computer Science and Technology, Anhui University
Xixi Wan
Xixi Wan
Anhui University
deep learninggraph learningmulti-modal representation learning
X
Xiao Wang
Information Materials and Intelligent Sensing Laboratory of Anhui Province, Anhui Provincial Key Laboratory of Multimodal Cognitive Computation, School of Computer Science and Technology, Anhui University
Jin Tang
Jin Tang
Anhui University
Computer visionintelligent video analysis