A Gap Between Decision Trees and Neural Networks

๐Ÿ“… 2026-01-07
๐Ÿ›๏ธ arXiv.org
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
This work investigates the fundamental gap between axis-aligned decision trees and shallow neural networks in terms of geometric parsimony and interpretability, focusing on the inherent tension in accurately approximating decision boundaries. By analyzing infinitely wide, norm-bounded single-hidden-layer ReLU networks through the Radon total variation (RTV) seminorm, the study distinguishes between classification recovery and score learning objectives, revealing for the first time that common smooth surrogates such as the sigmoid still exhibit infinite RTV in high dimensions. The authors innovatively construct a smooth barrier score function with finite RTV that enables exact threshold-based classification. They establish an Lยน(P) calibration bound under a tubular mass condition, proving that the calibration error decays polynomially with the sharpness parameter. Experiments validate the trade-offs among model complexity, accuracy, and threshold selection.

Technology Category

Application Category

๐Ÿ“ Abstract
We study when geometric simplicity of decision boundaries, used here as a notion of interpretability, can conflict with accurate approximation of axis-aligned decision trees by shallow neural networks. Decision trees induce rule-based, axis-aligned decision regions (finite unions of boxes), whereas shallow ReLU networks are typically trained as score models whose predictions are obtained by thresholding. We analyze the infinite-width, bounded-norm, single-hidden-layer ReLU class through the Radon total variation ($\mathrm{R}\mathrm{TV}$) seminorm, which controls the geometric complexity of level sets. We first show that the hard tree indicator $1_A$ has infinite $\mathrm{R}\mathrm{TV}$. Moreover, two natural split-wise continuous surrogates--piecewise-linear ramp smoothing and sigmoidal (logistic) smoothing--also have infinite $\mathrm{R}\mathrm{TV}$ in dimensions $d>1$, while Gaussian convolution yields finite $\mathrm{R}\mathrm{TV}$ but with an explicit exponential dependence on $d$. We then separate two goals that are often conflated: classification after thresholding (recovering the decision set) versus score learning (learning a calibrated score close to $1_A$). For classification, we construct a smooth barrier score $S_A$ with finite $\mathrm{R}\mathrm{TV}$ whose fixed threshold $\tau=1$ exactly recovers the box. Under a mild tube-mass condition near $\partial A$, we prove an $L_1(P)$ calibration bound that decays polynomially in a sharpness parameter, along with an explicit $\mathrm{R}\mathrm{TV}$ upper bound in terms of face measures. Experiments on synthetic unions of rectangles illustrate the resulting accuracy--complexity tradeoff and how threshold selection shifts where training lands along it.
Problem

Research questions and friction points this paper is trying to address.

decision trees
neural networks
geometric simplicity
interpretability
approximation
Innovation

Methods, ideas, or system contributions that make the work stand out.

Radon total variation
shallow ReLU networks
decision trees
geometric complexity
score calibration
๐Ÿ”Ž Similar Papers
No similar papers found.