NeuroTrace: Inference Provenance-Based Detection of Adversarial Examples

📅 2026-04-15

📈 Citations: 0

✨ Influential: 0

career value

176K/year

🤖 AI Summary

This work addresses the lack of transparency in deep neural network inference, which hinders effective detection of adversarial examples. The authors propose NeuroTrace, a novel framework that introduces the concept of inference provenance by instrumenting the model to construct an Inference Provenance Graph (IPG) that captures activation behaviors and parameter-driven data flows across layers. Leveraging a heterogeneous graph neural network, NeuroTrace enables robust adversarial example detection. Key contributions include a structured representation of cross-layer information flow, and the release of the first open-source dataset and benchmark supporting diverse attack types and cross-domain evaluation. Experiments demonstrate that NeuroTrace achieves high detection accuracy and strong generalization across multiple attack scenarios, significantly outperforming existing graph-based methods, while also providing a quantitative analysis of its runtime and memory overhead.

Technology Category

Application Category

📝 Abstract

Deep neural networks (DNNs) remain largely opaque at inference time, limiting our ability to detect and diagnose malicious input manipulations such as adversarial examples. Existing detection methods predominantly rely on layer-local signals (e.g., activations or attribution scores), leaving cross-layer information flow and execution structure under-explored. We introduce NeuroTrace, a framework and open dataset for analyzing inference provenance through Inference Provenance Graphs (IPGs). IPGs are heterogeneous graphs that capture both activation behavior and parameter-induced dataflow during a model's forward pass, providing a structured representation of how information propagates through the network. NeuroTrace includes (i) a reproducible extraction engine that instruments model execution, (ii) a standardized graph representation compatible with heterogeneous GNNs, and (iii) a benchmark suite spanning multiple adversarial attack families across vision and malware domains. Using this framework, we evaluate IPG-based detectors for adversarial example detection under intra-attack, multi-attack, and cross-threat transfer settings. Our results show that inference provenance provides a strong and transferable signal for distinguishing adversarial and benign inputs, achieving consistently high detection performance and improving over prior graph-based baselines. We further analyze the conditions under which provenance-based detection generalizes across attack types, as well as the associated runtime and storage trade-offs. By releasing the dataset, extraction pipeline, and evaluation protocol, NeuroTrace enables systematic study of inference-time behavior and establishes inference provenance as a practical foundation for building more transparent and auditable machine learning systems.

Problem

Research questions and friction points this paper is trying to address.

adversarial examples

inference provenance

deep neural networks

detection

transparency

Innovation

Methods, ideas, or system contributions that make the work stand out.

inference provenance

adversarial example detection

Inference Provenance Graphs (IPGs)