Complementary Subspace Low-Rank Adaptation of Vision-Language Models for Few-Shot Classification

๐Ÿ“… 2025-01-25
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
Low-rank adaptation (LoRA) applied to vision-language models (VLMs) for few-shot classification often induces catastrophic forgetting of pre-trained multimodal alignment capabilities. Method: We propose Complementary Subspace LoRA (Comp-LoRA), the first LoRA-based approach tailored for few-shot VLM fine-tuning. Comp-LoRA constrains the low-rank update to lie exclusively in the orthogonal complement of the principal component subspace spanned by the pre-trained weightsโ€”thereby preserving original image-text alignment while enabling task-specific adaptation. Contribution/Results: Comp-LoRA introduces no additional parameters and seamlessly integrates into standard PEFT pipelines. Experiments demonstrate a +1.0% improvement in Top-1 accuracy on few-shot classification and a +1.3% gain in zero-shot transfer performance, significantly mitigating forgetting without compromising generalization or task specificity.

Technology Category

Application Category

๐Ÿ“ Abstract
Vision language model (VLM) has been designed for large scale image-text alignment as a pretrained foundation model. For downstream few shot classification tasks, parameter efficient fine-tuning (PEFT) VLM has gained much popularity in the computer vision community. PEFT methods like prompt tuning and linear adapter have been studied for fine-tuning VLM while low rank adaptation (LoRA) algorithm has rarely been considered for few shot fine-tuning VLM. The main obstacle to use LoRA for few shot fine-tuning is the catastrophic forgetting problem. Because the visual language alignment knowledge is important for the generality in few shot learning, whereas low rank adaptation interferes with the most informative direction of the pretrained weight matrix. We propose the complementary subspace low rank adaptation (Comp-LoRA) method to regularize the catastrophic forgetting problem in few shot VLM finetuning. In detail, we optimize the low rank matrix in the complementary subspace, thus preserving the general vision language alignment ability of VLM when learning the novel few shot information. We conduct comparison experiments of the proposed Comp-LoRA method and other PEFT methods on fine-tuning VLM for few shot classification. And we also present the suppression on the catastrophic forgetting problem of our proposed method against directly applying LoRA to VLM. The results show that the proposed method surpasses the baseline method by about +1.0% Top-1 accuracy and preserves the VLM zero-shot performance over the baseline method by about +1.3% Top-1 accuracy.
Problem

Research questions and friction points this paper is trying to address.

Visual Language Models
Low-Rank Adaptation
Catastrophic Forgetting
Innovation

Methods, ideas, or system contributions that make the work stand out.

Complementary Subspaces
Low-Rank Adaptation
Visual Language Models
๐Ÿ”Ž Similar Papers
No similar papers found.
Zhongqi Wang
Zhongqi Wang
Institute of Computing Technology, Chinese Academy of Sciences
Model Robustness
J
Jia Dai
Dolby Lab. Inc., Beijing, China
K
Kai Li
Dolby Lab. Inc., Beijing, China
X
Xu Li
Dolby Lab. Inc., Beijing, China
Yanmeng Guo
Yanmeng Guo
Dolby Lab. Inc., Beijing, China
M
Maosheng Xiang
School of EECE, University of Chinese Academy of Science, Beijing, China