Stabilizing Data-Free Model Extraction

๐Ÿ“… 2025-09-14
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
To address the challenges of oscillating surrogate model accuracy and difficulty in selecting the optimal surrogate model without access to the target modelโ€™s real data in data-free model extraction, this paper proposes MetaDFMEโ€”a meta-learning-based data-free model extraction framework. MetaDFME employs meta-learning to optimize generator training, enabling rapid adaptation of synthesized samples to the target model by learning a meta-representation of synthetic data. It further integrates adversarial generation with distribution alignment to significantly mitigate distribution shift in synthetic data. Extensive experiments on MNIST, SVHN, CIFAR-10, and CIFAR-100 demonstrate that MetaDFME outperforms existing state-of-the-art methods in both extraction stability and attack success rate: surrogate model accuracy variance is reduced by 42%, and average accuracy improves by 3.7 percentage points.

Technology Category

Application Category

๐Ÿ“ Abstract
Model extraction is a severe threat to Machine Learning-as-a-Service systems, especially through data-free approaches, where dishonest users can replicate the functionality of a black-box target model without access to realistic data. Despite recent advancements, existing data-free model extraction methods suffer from the oscillating accuracy of the substitute model. This oscillation, which could be attributed to the constant shift in the generated data distribution during the attack, makes the attack impractical since the optimal substitute model cannot be determined without access to the target model's in-distribution data. Hence, we propose MetaDFME, a novel data-free model extraction method that employs meta-learning in the generator training to reduce the distribution shift, aiming to mitigate the substitute model's accuracy oscillation. In detail, we train our generator to iteratively capture the meta-representations of the synthetic data during the attack. These meta-representations can be adapted with a few steps to produce data that facilitates the substitute model to learn from the target model while reducing the effect of distribution shifts. Our experiments on popular baseline image datasets, MNIST, SVHN, CIFAR-10, and CIFAR-100, demonstrate that MetaDFME outperforms the current state-of-the-art data-free model extraction method while exhibiting a more stable substitute model's accuracy during the attack.
Problem

Research questions and friction points this paper is trying to address.

Stabilizing accuracy oscillation in data-free model extraction
Reducing synthetic data distribution shift during attacks
Determining optimal substitute model without target data
Innovation

Methods, ideas, or system contributions that make the work stand out.

Meta-learning stabilizes generator training
Reduces synthetic data distribution shifts
Improves substitute model accuracy stability
๐Ÿ”Ž Similar Papers
No similar papers found.