On the Statistical Optimality of Optimal Decision Trees

📅 2026-03-05
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the lack of rigorous statistical guarantees for empirical risk minimization (ERM) decision trees by introducing the piecewise sparse heterogeneous anisotropic Besov (PSHAB) function class, which for the first time provides a unified framework to characterize sparsity, anisotropic smoothness, and spatial heterogeneity. Building upon empirical localized Rademacher complexities, the authors develop a uniform concentration framework that accommodates both sub-Gaussian and heavy-tailed noise settings. Within this framework, they establish sharp oracle inequalities for ERM decision trees in high-dimensional regression and classification, and prove that these estimators achieve minimax-optimal convergence rates over the PSHAB class. This analysis systematically reveals the theoretical trade-off between interpretability and statistical accuracy inherent in decision tree methods.

Technology Category

Application Category

📝 Abstract
While globally optimal empirical risk minimization (ERM) decision trees have become computationally feasible and empirically successful, rigorous theoretical guarantees for their statistical performance remain limited. In this work, we develop a comprehensive statistical theory for ERM trees under random design in both high-dimensional regression and classification. We first establish sharp oracle inequalities that bound the excess risk of the ERM estimator relative to the best possible approximation achievable by any tree with at most $L$ leaves, thereby characterizing the interpretability-accuracy trade-off. We derive these results using a novel uniform concentration framework based on empirically localized Rademacher complexity. Furthermore, we derive minimax optimal rates over a novel function class: the piecewise sparse heterogeneous anisotropic Besov (PSHAB) space. This space explicitly captures three key structural features encountered in practice: sparsity, anisotropic smoothness, and spatial heterogeneity. While our main results are established under sub-Gaussianity, we also provide robust guarantees that hold under heavy-tailed noise settings. Together, these findings provide a principled foundation for the optimality of ERM trees and introduce empirical process tools broadly applicable to other highly adaptive, data-driven procedures.
Problem

Research questions and friction points this paper is trying to address.

optimal decision trees
statistical optimality
empirical risk minimization
high-dimensional regression
classification
Innovation

Methods, ideas, or system contributions that make the work stand out.

optimal decision trees
empirical risk minimization
Rademacher complexity
minimax optimality
Besov space
🔎 Similar Papers
No similar papers found.
Z
Zineng Xu
Department of Statistics and Data Science, National University of Singapore
S
Subhroshekhar Ghosh
Department of Mathematics, National University of Singapore
Yan Shuo Tan
Yan Shuo Tan
Assistant Professor, National University of Singapore
decision treesensemblesinterpretable machine learningcausality