Parameter Efficient Mamba Tuning via Projector-targeted Diagonal-centric Linear Transformation

πŸ“… 2024-11-21
πŸ›οΈ arXiv.org
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
Existing parameter-efficient fine-tuning (PEFT) methods for Mamba architectures over-rely on adapting the State Space Model (SSM) module, neglecting the critical role of the projector in transfer learning. Method: This work first identifies the projector as the dominant component for cross-task adaptation and proposes ProDiaLβ€”a novel PEFT method that introduces only a learnable diagonal linear transformation applied centrally to the pretrained projector, enabling targeted, non-weight-updating adaptation while freezing all SSM parameters. Contribution/Results: ProDiaL decouples projector optimization from SSM learning, reducing trainable parameters by >99% (<1% of total). On both vision and language Mamba models, it achieves performance on par with full fine-tuning at minimal computational cost, demonstrating strong generalization. As the first projector-centric PEFT paradigm for Mamba, ProDiaL challenges the prevailing SSM-centric design philosophy and establishes a new direction for efficient Mamba adaptation.

Technology Category

Application Category

πŸ“ Abstract
Despite the growing interest in Mamba architecture as a potential replacement for Transformer architecture, parameter-efficient fine-tuning (PEFT) approaches for Mamba remain largely unexplored. In our study, we introduce two key insights-driven strategies for PEFT in Mamba architecture: (1) While state-space models (SSMs) have been regarded as the cornerstone of Mamba architecture, then expected to play a primary role in transfer learning, our findings reveal that Projectors -- not SSMs -- are the predominant contributors to transfer learning. (2) Based on our observation, we propose a novel PEFT method specialized to Mamba architecture: Projector-targeted Diagonal-centric Linear Transformation (ProDiaL). ProDiaL focuses on optimizing only the pretrained Projectors for new tasks through diagonal-centric linear transformation matrices, without directly fine-tuning the Projector weights. This targeted approach allows efficient task adaptation, utilizing less than 1% of the total parameters, and exhibits strong performance across both vision and language Mamba models, highlighting its versatility and effectiveness.
Problem

Research questions and friction points this paper is trying to address.

Exploring parameter-efficient fine-tuning for Mamba architecture.
Identifying Projectors as key for transfer learning in Mamba.
Proposing ProDiaL to optimize Projectors with minimal parameters.
Innovation

Methods, ideas, or system contributions that make the work stand out.

Focuses on optimizing pretrained Projectors
Uses diagonal-centric linear transformation matrices
Efficiently adapts tasks with <1% parameters
πŸ”Ž Similar Papers
No similar papers found.