PETAH: Parameter Efficient Task Adaptation for Hybrid Transformers in a resource-limited Context

📅 2024-10-23

🏛️ arXiv.org

📈 Citations: 1

✨ Influential: 0

career value

169K/year

🤖 AI Summary

Existing hybrid CNN-Transformer vision models lack efficient multi-task adaptation methods under resource-constrained settings. Method: This paper proposes PETAH, a parameter-efficient task-adaptation framework that— for the first time—integrates parameter-efficient fine-tuning (e.g., LoRA variants) into hybrid backbones and couples it with channel- and layer-aware structured pruning to jointly optimize storage and computation. Lightweight adapter modules are co-designed with hybrid backbone fine-tuning. Results: On multiple vision tasks—including classification—PETAH significantly outperforms mainstream ViT adaptation approaches: it reduces trainable parameters by 37%, accelerates mobile inference by 1.8×, and maintains or improves accuracy by up to 0.5%. This work establishes a new paradigm for deploying lightweight, multi-task vision models in edge and resource-limited environments.

Technology Category

Application Category

📝 Abstract

Following their success in natural language processing (NLP), there has been a shift towards transformer models in computer vision. While transformers perform well and offer promising multi-tasking performance, due to their high compute requirements, many resource-constrained applications still rely on convolutional or hybrid models that combine the benefits of convolution and attention layers and achieve the best results in the sub 100M parameter range. Simultaneously, task adaptation techniques that allow for the use of one shared transformer backbone for multiple downstream tasks, resulting in great storage savings at negligible cost in performance, have not yet been adopted for hybrid transformers. In this work, we investigate how to achieve the best task-adaptation performance and introduce PETAH: Parameter Efficient Task Adaptation for Hybrid Transformers. We further combine PETAH adaptation with pruning to achieve highly performant and storage friendly models for multi-tasking. In our extensive evaluation on classification and other vision tasks, we demonstrate that our PETAH-adapted hybrid models outperform established task-adaptation techniques for ViTs while requiring fewer parameters and being more efficient on mobile hardware.

Problem

Research questions and friction points this paper is trying to address.

Efficient task adaptation for hybrid transformers

Parameter reduction for resource-constrained applications

Storage-friendly multi-tasking models for mobile hardware

Innovation

Methods, ideas, or system contributions that make the work stand out.

Parameter efficient adaptation for hybrid transformers

Combining task adaptation with pruning techniques

Optimizing multi-task performance on mobile hardware

🔎 Similar Papers

No similar papers found.