Interpretable global minima of deep ReLU neural networks on sequentially separable data

📅 2024-05-11
🏛️ arXiv.org
📈 Citations: 4
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the construction of global minimizers achieving zero training loss for deep ReLU networks under two specific data distributions: (i) clusters with small intra-class diameter and sufficient inter-class separation, and (ii) sequentially linearly separable classes. We propose an explicit, analytical construction method that characterizes the globally optimal weights and biases in cumulative parametric form, reducing the parameter count to $Q(M+2)$—where $M$ is the number of classes—achieving the theoretical lower bound. Our approach establishes a one-to-one correspondence between truncated mappings and recursive partitioning of the input space; leveraging geometric separability analysis and recursive ReLU modeling, it guarantees full interpretability of the solution. Experiments confirm exact zero training loss on sequentially linearly separable data, revealing that optimal solutions of deep networks possess an intrinsically analyzable structural form.

Technology Category

Application Category

📝 Abstract
We explicitly construct zero loss neural network classifiers. We write the weight matrices and bias vectors in terms of cumulative parameters, which determine truncation maps acting recursively on input space. The configurations for the training data considered are (i) sufficiently small, well separated clusters corresponding to each class, and (ii) equivalence classes which are sequentially linearly separable. In the best case, for $Q$ classes of data in $mathbb{R}^M$, global minimizers can be described with $Q(M+2)$ parameters.
Problem

Research questions and friction points this paper is trying to address.

Construct zero-loss ReLU networks for classification
Define weights via cumulative truncation parameters
Achieve interpretability with minimal Q(M+2) parameters
Innovation

Methods, ideas, or system contributions that make the work stand out.

Zero loss neural network classifiers constructed explicitly
Weight matrices defined via cumulative parameters
Global minimizers described with Q(M+2) parameters
🔎 Similar Papers
No similar papers found.