🤖 AI Summary
This work addresses the interpretability of ReLU feedforward neural networks—featuring ReLU activations in hidden layers and truncated identity outputs—by proposing the first systematic, computationally tractable algorithm for exact piecewise-linear (PWL) conversion into explicit region-based representations. Methodologically, it integrates structural decomposition of PWL functions, symbolic region enumeration, and network topology-aware optimization to significantly improve conversion efficiency. Key contributions include: (1) the first strictly computable transformation of ReLU networks into explicit region-wise linear forms; (2) theoretical characterization of exponential growth in the number of linear regions with respect to network depth and width, empirically validated to quantify practical complexity bounds; and (3) proof that small- to medium-scale networks generally satisfy lattice- and logic-representability conditions, thereby enabling subsequent formal explanation and verification.
📝 Abstract
A possible path to the interpretability of neural networks is to (approximately) represent them in the regional format of piecewise linear functions, where regions of inputs are associated to linear functions computing the network outputs. We present an algorithm for the translation of feedforward neural networks with ReLU activation functions in hidden layers and truncated identity activation functions in the output layer. We also empirically investigate the complexity of regional representations outputted by our method for neural networks with varying sizes. Lattice and logical representations of neural networks are straightforward from regional representations as long as they satisfy a specific property. So we empirically investigate to what extent the translations by our algorithm satisfy such property.