Examining the Threat Landscape: Foundation Models and Model Stealing

📅 2025-02-25

📈 Citations: 0

✨ Influential: 0

career value

191K/year

🤖 AI Summary

This work identifies a novel model stealing threat against foundation models (FMs) in vision tasks, arising from their strong representational capacity. To exploit this vulnerability, the authors propose a black-box query-based attack framework targeting fine-tuned FMs—specifically Vision Transformers (ViTs)—and conduct cross-architecture empirical evaluations on CIFAR-10. Results show that ViT-L/16 victim models achieve a prediction agreement rate of 94.28% with stolen surrogates, substantially exceeding ResNet-18’s 73.20%, underscoring FMs’ heightened susceptibility. Crucially, the study establishes for the first time that the universality of FM pre-trained representations—not merely architectural traits—is the fundamental root cause of their stealability. Empirical evidence further demonstrates that such vulnerabilities render FMs unsuitable for deployment via commercial APIs without mitigation. The findings urge model owners to recognize critical security gaps in FM deployment and advocate for dedicated defense mechanisms explicitly designed to enhance representation-level robustness against model extraction.

Technology Category

Application Category

📝 Abstract

Foundation models (FMs) for computer vision learn rich and robust representations, enabling their adaptation to task/domain-specific deployments with little to no fine-tuning. However, we posit that the very same strength can make applications based on FMs vulnerable to model stealing attacks. Through empirical analysis, we reveal that models fine-tuned from FMs harbor heightened susceptibility to model stealing, compared to conventional vision architectures like ResNets. We hypothesize that this behavior is due to the comprehensive encoding of visual patterns and features learned by FMs during pre-training, which are accessible to both the attacker and the victim. We report that an attacker is able to obtain 94.28% agreement (matched predictions with victim) for a Vision Transformer based victim model (ViT-L/16) trained on CIFAR-10 dataset, compared to only 73.20% agreement for a ResNet-18 victim, when using ViT-L/16 as the thief model. We arguably show, for the first time, that utilizing FMs for downstream tasks may not be the best choice for deployment in commercial APIs due to their susceptibility to model theft. We thereby alert model owners towards the associated security risks, and highlight the need for robust security measures to safeguard such models against theft. Code is available at https://github.com/rajankita/foundation_model_stealing.

Problem

Research questions and friction points this paper is trying to address.

Foundation models vulnerable to stealing

Heightened theft risk in fine-tuned models

Need for robust security measures

Innovation

Methods, ideas, or system contributions that make the work stand out.

Foundation models enhance task adaptability

Models fine-tuned from FMs are theft-prone

Vision Transformers show higher susceptibility to theft

🔎 Similar Papers

Can't Hide Behind the API: Stealing Black-Box Commercial Embedding Models