Causal Explanations from the Geometric Properties of ReLU Neural Networks

📅 2026-05-11

📈 Citations: 0

✨ Influential: 0

career value

204K/year

🤖 AI Summary

Black-box neural networks are difficult to deploy in safety-critical autonomous systems due to their lack of interpretability. This work proposes a novel approach that leverages the geometric properties of ReLU networks, treating them as piecewise linear functions partitioned by convex polyhedra, to directly and precisely extract causal rules from the original model architecture. By exploiting the exact geometric structure of the network, the method generates faithful explanations of decision behavior without relying on approximation or model distillation, thereby avoiding the performance degradation and behavioral discrepancies commonly introduced by traditional surrogate-based techniques. This represents the first framework to produce causal explanations grounded in the precise geometry of ReLU networks, offering both theoretical foundations and practical guarantees for high-assurance explainable artificial intelligence.

📝 Abstract

Neural networks have proved an effective means of learning control policies for autonomous systems, but these learned policies are difficult to understand due to the black-box nature of neural networks. This lack of interpretability makes safety assurance for such autonomous systems challenging. The fields of eXplainable Artificial Intelligence (XAI) and eXplainable Reinforcement Learning (XRL) aim to interpret the decision making processes of neural networks and autonomous agents, respectively. In particular, work on causal explanations aims to provide "why" and "why not" explanations for why a model made a given decision. However, most of the work on explainability to date utilises a distilled version of the original model. While this distilled policy is interpretable, it necessarily degrades in performance significantly when compared to the original model, and is not guaranteed to be an accurate reflection of the decision making processes in the original model and as such cannot be used to guarantee its safety. Recent work on understanding the geometry of ReLU neural networks shows that a ReLU network corresponds to a piecewise linear function divided into regions defined by an n-dimensional convex polytope. Through this lens, a neural network can be understood as dividing the input space into distinct regions which apply a single linear function for each output neuron. We show that this geometric representation can be used to generate causal explanations for the network's behaviour similar to previous work, but which extracts rules directly from the geometry of Neural Networks with the ReLU activation function, and is therefore an accurate reflection of the network's behaviour.

Problem

Research questions and friction points this paper is trying to address.

causal explanations

ReLU neural networks

interpretability

safety assurance

geometric properties

Innovation

Methods, ideas, or system contributions that make the work stand out.

Causal Explanation

ReLU Neural Networks

Geometric Interpretability