Effective Fine-Tuning of Vision Transformers with Low-Rank Adaptation for Privacy-Preserving Image Classification

📅 2025-07-16

📈 Citations: 0

✨ Influential: 0

career value

170K/year

🤖 AI Summary

This work addresses privacy-preserving image classification by proposing LoRA-ViT, a low-rank adaptation method tailored for Vision Transformers (ViTs). To balance parameter efficiency and model accuracy, LoRA-ViT injects trainable low-rank decomposition matrices into the attention and feed-forward layers of ViT, while *innovatively retaining only the patch embedding layer as learnable and freezing all other parameters*. This design reduces the number of trainable parameters to typically less than 1%, substantially lowering training overhead and memory consumption, and natively supports privacy-preserving training frameworks such as differential privacy. Evaluated on multiple image classification benchmarks—including CIFAR-10, CIFAR-100, and ImageNet-1K subsets—LoRA-ViT achieves accuracy comparable to full fine-tuning, demonstrating both high parameter efficiency and strong generalization capability under privacy constraints.

Technology Category

Application Category

📝 Abstract

We propose a low-rank adaptation method for training privacy-preserving vision transformer (ViT) models that efficiently freezes pre-trained ViT model weights. In the proposed method, trainable rank decomposition matrices are injected into each layer of the ViT architecture, and moreover, the patch embedding layer is not frozen, unlike in the case of the conventional low-rank adaptation methods. The proposed method allows us not only to reduce the number of trainable parameters but to also maintain almost the same accuracy as that of full-time tuning.

Problem

Research questions and friction points this paper is trying to address.

Enhance privacy in image classification using ViTs

Reduce trainable parameters via low-rank adaptation

Maintain accuracy while freezing pre-trained weights

Innovation

Methods, ideas, or system contributions that make the work stand out.

Low-rank adaptation for efficient ViT training

Trainable rank matrices injected into ViT layers

Unfrozen patch embedding layer enhances accuracy

🔎 Similar Papers

No similar papers found.