Effective Fine-Tuning of Vision Transformers with Low-Rank Adaptation for Privacy-Preserving Image Classification

📅 2025-07-16
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses privacy-preserving image classification by proposing LoRA-ViT, a low-rank adaptation method tailored for Vision Transformers (ViTs). To balance parameter efficiency and model accuracy, LoRA-ViT injects trainable low-rank decomposition matrices into the attention and feed-forward layers of ViT, while *innovatively retaining only the patch embedding layer as learnable and freezing all other parameters*. This design reduces the number of trainable parameters to typically less than 1%, substantially lowering training overhead and memory consumption, and natively supports privacy-preserving training frameworks such as differential privacy. Evaluated on multiple image classification benchmarks—including CIFAR-10, CIFAR-100, and ImageNet-1K subsets—LoRA-ViT achieves accuracy comparable to full fine-tuning, demonstrating both high parameter efficiency and strong generalization capability under privacy constraints.

Technology Category

Application Category

📝 Abstract
We propose a low-rank adaptation method for training privacy-preserving vision transformer (ViT) models that efficiently freezes pre-trained ViT model weights. In the proposed method, trainable rank decomposition matrices are injected into each layer of the ViT architecture, and moreover, the patch embedding layer is not frozen, unlike in the case of the conventional low-rank adaptation methods. The proposed method allows us not only to reduce the number of trainable parameters but to also maintain almost the same accuracy as that of full-time tuning.
Problem

Research questions and friction points this paper is trying to address.

Enhance privacy in image classification using ViTs
Reduce trainable parameters via low-rank adaptation
Maintain accuracy while freezing pre-trained weights
Innovation

Methods, ideas, or system contributions that make the work stand out.

Low-rank adaptation for efficient ViT training
Trainable rank matrices injected into ViT layers
Unfrozen patch embedding layer enhances accuracy
🔎 Similar Papers
No similar papers found.
H
Haiwei Lin
Graduate School of Informatics, Chiba University, Chiba, Japan
S
Shoko Imaizumi
Graduate School of Informatics, Chiba University, Chiba, Japan
Hitoshi Kiya
Hitoshi Kiya
Professor Emeritus, Tokyo Metropolitan University
Signal ProcessingComputer VisionMachine LearningInformation Security