🤖 AI Summary
To address domain adaptation of vision-language models (VLMs) under extreme few-shot settings, this paper proposes a sparse optimization framework that mitigates overfitting and reduces computational overhead. Methodologically, it introduces two novel paradigms—“locally sparse, globally dense” and “locally random, globally important”—abandoning rigid low-rank constraints. The framework employs sparse gradient updates, importance-aware first-moment pruning, and randomized sparse parameter sampling to achieve parameter-efficient fine-tuning. Evaluated on 11 heterogeneous few-shot benchmarks, the method achieves state-of-the-art performance while reducing GPU memory consumption by 37%–52% and training FLOPs by 41%–63%. These gains significantly enhance both generalization capability and deployment efficiency for VLMs in data-scarce scenarios.
📝 Abstract
Adapting Vision-Language Models (VLMs) to new domains with few labeled samples remains a significant challenge due to severe overfitting and computational constraints. State-of-the-art solutions, such as low-rank reparameterization, mitigate these issues but often struggle with generalization and require extensive hyperparameter tuning. In this paper, a novel Sparse Optimization (SO) framework is proposed. Unlike low-rank approaches that typically constrain updates to a fixed subspace, our SO method leverages high sparsity to dynamically adjust very few parameters. We introduce two key paradigms. First, we advocate for extit{local sparsity and global density}, which updates a minimal subset of parameters per iteration while maintaining overall model expressiveness. As a second paradigm, we advocate for extit{local randomness and global importance}, which sparsifies the gradient using random selection while pruning the first moment based on importance. This combination significantly mitigates overfitting and ensures stable adaptation in low-data regimes. Extensive experiments on 11 diverse datasets show that SO achieves state-of-the-art few-shot adaptation performance while reducing memory overhead.