Pre-trained Encoder Inference: Revealing Upstream Encoders In Downstream Machine Learning Services

📅 2024-08-05
🏛️ arXiv.org
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work identifies a novel privacy threat—Pretrained Encoder Inference (PEI) attacks—facing pretrained encoders deployed in downstream machine learning services. Unlike conventional attacks targeting upstream models directly, PEI is a black-box API-based attack that accurately infers the implicitly used encoder solely from service outputs. We systematically model encoder output distributions and propose a unified framework integrating task-aware feature distillation, confidence calibration, and distributional similarity measurement. Evaluated across image classification, text classification, and text-to-image generation, our method achieves >92% average encoder identification accuracy. Even when the true encoder is excluded from the candidate set, the false-positive rate remains below 5%. Furthermore, we demonstrate its practical impact by enabling adversarial attacks against the LLaVA multimodal model. To our knowledge, this is the first work to formally establish PEI as a critical privacy risk for downstream services, providing both theoretical foundations and empirical evidence for secure encoder deployment.

Technology Category

Application Category

📝 Abstract
Though pre-trained encoders can be easily accessed online to build downstream machine learning (ML) services quickly, various attacks have been designed to compromise the security and privacy of these encoders. While most attacks target encoders on the upstream side, it remains unknown how an encoder could be threatened when deployed in a downstream ML service. This paper unveils a new vulnerability: the Pre-trained Encoder Inference (PEI) attack, which posts privacy threats toward encoders hidden behind downstream ML services. By only providing API accesses to a targeted downstream service and a set of candidate encoders, the PEI attack can infer which encoder is secretly used by the targeted service based on candidate ones. We evaluate the attack performance of PEI against real-world encoders on three downstream tasks: image classification, text classification, and text-to-image generation. Experiments show that the PEI attack succeeds in revealing the hidden encoder in most cases and seldom makes mistakes even when the hidden encoder is not in the candidate set. We also conducted a case study on one of the most recent vision-language models, LLaVA, to illustrate that the PEI attack is useful in assisting other ML attacks such as adversarial attacks. The code is available at https://github.com/fshp971/encoder-inference.
Problem

Research questions and friction points this paper is trying to address.

Detects hidden pre-trained encoders in downstream ML services
Exposes encoder vulnerabilities via API access to services
Facilitates other ML attacks like model stealing and adversarial attacks
Innovation

Methods, ideas, or system contributions that make the work stand out.

PEI attack extracts encoder info from APIs
Infer hidden encoders in downstream services
Facilitates model stealing and adversarial attacks
🔎 Similar Papers
No similar papers found.
Shaopeng Fu
Shaopeng Fu
King Abdullah University of Science and Technology
Trustworthy Machine LearningAI Security
X
Xuexue Sun
HitoX
K
Ke Qing
University of Science and Technology of China
Tianhang Zheng
Tianhang Zheng
Zhejiang University
D
Di Wang
King Abdullah University of Science and Technology