π€ AI Summary
This work addresses the practical limitations of existing membership inference attacks in strict black-box settings where only predicted labels are accessible, as such methods suffer from low query efficiency and rely on unrealistic assumptions. The authors propose a novel approach that constructs a functionally equivalent surrogate model through active sampling, perturbation selection, and data synthesis, thereby transferring the membership inference task to this surrogate. This strategy concentrates the high query cost into a one-time model extraction phase, eliminating the need for repeated queries to the target model. For the first time, the method achieves high-accuracy membership inference in a pure label-only black-box scenario, significantly reducing query costs without requiring confidence scores or shadow models. On benchmark datasets such as Purchase, Location, and Texas Hospital, the surrogate model is extracted using only about 1% of the training set size in query budget, with membership inference accuracy within Β±1% of the target modelβs performance.
π Abstract
Membership inference attacks (MIAs) threaten the privacy of machine learning models by revealing whether a specific data point was used during training. Existing MIAs often rely on impractical assumptions such as access to public datasets, shadow models, confidence scores, or training data distribution knowledge and making them vulnerable to defenses like confidence masking and adversarial regularization. Label-only MIAs, even under strict constraints suffer from high query requirements per sample. We propose a cost-effective label-only MIA framework based on transferability and model extraction. By querying the target model M using active sampling, perturbation-based selection, and synthetic data, we extract a functionally similar surrogate S on which membership inference is performed. This shifts query overhead to a one-time extraction phase, eliminating repeated queries to M . Operating under strict black-box constraints, our method matches the performance of state-of-the-art label-only MIAs while significantly reducing query costs. On benchmarks including Purchase, Location, and Texas Hospital, we show that a query budget equivalent to testing $\approx1\%$ of training samples suffices to extract S and achieve membership inference accuracy within $\pm1\%$ of M . We also evaluate the effectiveness of standard defenses proposed for label-only MIAs against our attack.