🤖 AI Summary
This work identifies a novel privacy threat—Pretrained Encoder Inference (PEI) attacks—facing pretrained encoders deployed in downstream machine learning services. Unlike conventional attacks targeting upstream models directly, PEI is a black-box API-based attack that accurately infers the implicitly used encoder solely from service outputs. We systematically model encoder output distributions and propose a unified framework integrating task-aware feature distillation, confidence calibration, and distributional similarity measurement. Evaluated across image classification, text classification, and text-to-image generation, our method achieves >92% average encoder identification accuracy. Even when the true encoder is excluded from the candidate set, the false-positive rate remains below 5%. Furthermore, we demonstrate its practical impact by enabling adversarial attacks against the LLaVA multimodal model. To our knowledge, this is the first work to formally establish PEI as a critical privacy risk for downstream services, providing both theoretical foundations and empirical evidence for secure encoder deployment.
📝 Abstract
Though pre-trained encoders can be easily accessed online to build downstream machine learning (ML) services quickly, various attacks have been designed to compromise the security and privacy of these encoders. While most attacks target encoders on the upstream side, it remains unknown how an encoder could be threatened when deployed in a downstream ML service. This paper unveils a new vulnerability: the Pre-trained Encoder Inference (PEI) attack, which posts privacy threats toward encoders hidden behind downstream ML services. By only providing API accesses to a targeted downstream service and a set of candidate encoders, the PEI attack can infer which encoder is secretly used by the targeted service based on candidate ones. We evaluate the attack performance of PEI against real-world encoders on three downstream tasks: image classification, text classification, and text-to-image generation. Experiments show that the PEI attack succeeds in revealing the hidden encoder in most cases and seldom makes mistakes even when the hidden encoder is not in the candidate set. We also conducted a case study on one of the most recent vision-language models, LLaVA, to illustrate that the PEI attack is useful in assisting other ML attacks such as adversarial attacks. The code is available at https://github.com/fshp971/encoder-inference.