Split Adaptation for Pre-trained Vision Transformers

📅 2025-03-01
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the challenge of simultaneously ensuring data privacy, model ownership protection, and high accuracy in few-shot adaptation of Vision Transformers (ViTs) for privacy-sensitive scenarios, this paper proposes the Split Adaptation (SA) framework. SA partitions the ViT into a lightweight client-side frontend—quantized to low-bit precision—and a server-side backend. It integrates dual-layer differential-privacy-style noise injection, out-of-distribution data/model augmentation, and patch-based retrieval enhancement. Notably, SA introduces the first synergistic mechanism combining low-bit quantization of the frontend with dual-layer noise, guaranteeing zero model leakage from the client and minimal computational overhead while substantially improving robustness and accuracy in few-shot adaptation. Evaluated across multiple benchmark datasets, SA achieves average accuracy gains of 3.2–7.8% over state-of-the-art methods and demonstrates strong resilience against advanced data reconstruction attacks.

Technology Category

Application Category

📝 Abstract
Vision Transformers (ViTs), extensively pre-trained on large-scale datasets, have become essential to foundation models, allowing excellent performance on diverse downstream tasks with minimal adaptation. Consequently, there is growing interest in adapting pre-trained ViTs across various fields, including privacy-sensitive domains where clients are often reluctant to share their data. Existing adaptation methods typically require direct data access, rendering them infeasible under these constraints. A straightforward solution may be sending the pre-trained ViT to clients for local adaptation, which poses issues of model intellectual property protection and incurs heavy client computation overhead. To address these issues, we propose a novel split adaptation (SA) method that enables effective downstream adaptation while protecting data and models. SA, inspired by split learning (SL), segments the pre-trained ViT into a frontend and a backend, with only the frontend shared with the client for data representation extraction. But unlike regular SL, SA replaces frontend parameters with low-bit quantized values, preventing direct exposure of the model. SA allows the client to add bi-level noise to the frontend and the extracted data representations, ensuring data protection. Accordingly, SA incorporates data-level and model-level out-of-distribution enhancements to mitigate noise injection's impact on adaptation performance. Our SA focuses on the challenging few-shot adaptation and adopts patch retrieval augmentation for overfitting alleviation. Extensive experiments on multiple datasets validate SA's superiority over state-of-the-art methods and demonstrate its defense against advanced data reconstruction attacks while preventing model leakage with minimal computation cost on the client side. The source codes can be found at https://github.com/conditionWang/Split_Adaptation.
Problem

Research questions and friction points this paper is trying to address.

Adapt pre-trained ViTs without sharing sensitive client data.
Protect model intellectual property during local adaptation.
Ensure data and model security with minimal client computation.
Innovation

Methods, ideas, or system contributions that make the work stand out.

Split adaptation segments ViT into frontend and backend.
Low-bit quantization protects model intellectual property.
Bi-level noise ensures data and model protection.
🔎 Similar Papers
No similar papers found.