🤖 AI Summary
This work addresses model extraction attacks against PReLU neural networks—a class of nonlinear models with greater expressiveness than ReLU but lacking systematic security analysis. We propose a novel parameter inversion method based on raw model outputs, integrating output-space analysis with end-to-end empirical validation. Our approach achieves high-fidelity recovery of both architecture and parameters across three realistic attack settings: full-output access, top-m class probability scores, and query-limited black-box access. On benchmark datasets including MNIST, our method attains over 99% accuracy in both structural identification and parameter reconstruction for diverse PReLU models. This study bridges a critical gap in model extraction research—extending it from fixed activation functions (e.g., ReLU) to learnable ones—and establishes a new paradigm for security evaluation of deep neural networks, backed by rigorous experimental evidence.
📝 Abstract
The machine learning problem of model extraction was first introduced in 1991 and gained prominence as a cryptanalytic challenge starting with Crypto 2020. For over three decades, research in this field has primarily focused on ReLU-based neural networks. In this work, we take the first step towards the cryptanalytic extraction of PReLU neural networks, which employ more complex nonlinear activation functions than their ReLU counterparts. We propose a raw output-based parameter recovery attack for PReLU networks and extend it to more restrictive scenarios where only the top-m probability scores are accessible. Our attacks are rigorously evaluated through end-to-end experiments on diverse PReLU neural networks, including models trained on the MNIST dataset. To the best of our knowledge, this is the first practical demonstration of PReLU neural network extraction across three distinct attack scenarios.