🤖 AI Summary
Existing black-box weight extraction methods (e.g., Carlini et al., 2020) fail to recover layer-wise weight magnitudes (“signatures”) for deep neural networks due to matrix rank deficiency and cross-layer noise accumulation. Method: This work systematically identifies these fundamental limitations and proposes a novel framework inspired by differential cryptanalysis, integrating matrix rank recovery, adaptive noise suppression, and ReLU-aware hierarchical calibration. Contribution/Results: Our approach overcomes layer-depth constraints, enabling the first complete signature extraction from an eight-layer ReLU network trained on CIFAR-10. It achieves ≥95% input-space matching accuracy—substantially surpassing prior state-of-the-art methods, which are limited to the first three layers.
📝 Abstract
Neural network model extraction has emerged in recent years as an important security concern, as adversaries attempt to recover a network's parameters via black-box queries. A key step in this process is signature extraction, which aims to recover the absolute values of the network's weights layer by layer. Prior work, notably by Carlini et al. (2020), introduced a technique inspired by differential cryptanalysis to extract neural network parameters. However, their method suffers from several limitations that restrict its applicability to networks with a few layers only. Later works focused on improving sign extraction, but largely relied on the assumption that signature extraction itself was feasible. In this work, we revisit and refine the signature extraction process by systematically identifying and addressing for the first time critical limitations of Carlini et al.'s signature extraction method. These limitations include rank deficiency and noise propagation from deeper layers. To overcome these challenges, we propose efficient algorithmic solutions for each of the identified issues, greatly improving the efficiency of signature extraction. Our approach permits the extraction of much deeper networks than was previously possible. We validate our method through extensive experiments on ReLU-based neural networks, demonstrating significant improvements in extraction depth and accuracy. For instance, our extracted network matches the target network on at least 95% of the input space for each of the eight layers of a neural network trained on the CIFAR-10 dataset, while previous works could barely extract the first three layers. Our results represent a crucial step toward practical attacks on larger and more complex neural network architectures.