Unifying Structure and Activation: A Comprehensive Approach of Parameter and Memory Efficient Transfer Learning

📅 2025-03-11

📈 Citations: 0

✨ Influential: 0

career value

190K/year

🤖 AI Summary

Existing parameter-efficient transfer learning (PETL) methods reduce trainable parameters but fail to significantly alleviate activation memory overhead, hindering deployment on memory-constrained devices. To address this, we propose Structure-to-Activation (S2A), the first framework unifying model structure and activation optimization. S2A introduces structured activation modules—bias-, prompt-, and side-based—and synergistically integrates them with gradient-aware 4-bit non-parametric activation quantization, jointly compressing both tunable parameters and activation memory. Compatible with diverse backbone architectures, S2A achieves an average 4× reduction in GPU memory footprint and substantial decreases in trainable parameters across multiple benchmarks, while maintaining accuracy comparable to state-of-the-art PETL approaches. The framework is hardware-friendly and generalizable, offering a practical solution for efficient fine-tuning under strict memory budgets.

Technology Category

Application Category

📝 Abstract

Parameter-efficient transfer learning (PETL) aims to reduce the scales of pre-trained models for multiple downstream tasks. However, as the models keep scaling up, the memory footprint of existing PETL methods is not significantly reduced compared to the reduction of learnable parameters. This limitation hinders the practical deployment of PETL methods on memory-constrained devices. To this end, we proposed a new PETL framework, called Structure to Activation (S2A), to reduce the memory footprint of activation during fine-tuning. Specifically, our framework consists of: 1)Activation modules design(i.e. bias, prompt and side modules) in the parametric model structure, which results in a significant reduction of adjustable parameters and activation memory 2) 4-bit quantisation of activations based on their derivatives for non-parametric structures (e.g., nonlinear functions), which maintains accuracy while significantly reducing memory usage. Our S2A method consequently offers a lightweight solution in terms of both parameter and memory footprint. We evaluate S2A with different backbones and conduct extensive experiments on various datasets to evaluate the effectiveness. The results show that our method not only outperforms existing PETL techniques, achieving a fourfold reduction in GPU memory footprint on average, but also shows competitive performance in accuracy with lower tunable parameters. These also demonstrate that our method is highly suitable for practical transfer learning on hardware-constrained devices.

Problem

Research questions and friction points this paper is trying to address.

Reduces memory footprint in transfer learning

Enhances efficiency on memory-constrained devices

Maintains accuracy with fewer tunable parameters

Innovation

Methods, ideas, or system contributions that make the work stand out.

Activation modules reduce parameters and memory

4-bit quantization minimizes activation memory usage

S2A framework enhances efficiency on constrained devices

🔎 Similar Papers

M$^2$IST: Multi-Modal Interactive Side-Tuning for Efficient Referring Expression Comprehension