Free Lunch in the Forest: Functionally-Identical Pruning of Boosted Tree Ensembles

📅 2024-08-28
🏛️ arXiv.org
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Tree ensemble models suffer from large memory footprint, slow inference, and poor interpretability. Method: This paper proposes a functionally equivalent pruning technique that compresses models without any accuracy loss—strictly preserving the original model’s output for all inputs. We formally define and realize function-level equivalence pruning, and develop an exact optimization framework based on iterative adversarial example augmentation, integrating combinatorial optimization modeling, structural equivalence analysis of decision trees, and a greedy pruning strategy. Contribution/Results: Experiments across multiple benchmark datasets demonstrate compression rates exceeding 90%, while fully retaining the original model’s predictive behavior and all aggregated performance metrics. The pruned models significantly improve deployment efficiency and enhance model interpretability, offering a rigorous, zero-loss compression solution for tree ensembles.

Technology Category

Application Category

📝 Abstract
Tree ensembles, including boosting methods, are highly effective and widely used for tabular data. However, large ensembles lack interpretability and require longer inference times. We introduce a method to prune a tree ensemble into a reduced version that is"functionally identical"to the original model. In other words, our method guarantees that the prediction function stays unchanged for any possible input. As a consequence, this pruning algorithm is lossless for any aggregated metric. We formalize the problem of functionally identical pruning on ensembles, introduce an exact optimization model, and provide a fast yet highly effective method to prune large ensembles. Our algorithm iteratively prunes considering a finite set of points, which is incrementally augmented using an adversarial model. In multiple computational experiments, we show that our approach is a"free lunch", significantly reducing the ensemble size without altering the model's behavior. Thus, we can preserve state-of-the-art performance at a fraction of the original model's size.
Problem

Research questions and friction points this paper is trying to address.

Large Tree Ensembles
Interpretability
Computational Efficiency
Innovation

Methods, ideas, or system contributions that make the work stand out.

Model Pruning
Complexity Reduction
Performance Preservation
🔎 Similar Papers
No similar papers found.
Y
Youssouf Emine
Department of Mathematics and Industrial Engineering, Polytechnique Montréal; Canada Excellence Research Chair in Data-Science for Real-time Decision-Making (CERC)
A
Alexandre Forel
Department of Mathematics and Industrial Engineering, Polytechnique Montréal; SCALE-AI Chair in Data-Driven Supply Chains; Centre Interuniversitaire de Recherche sur les Réseaux d’Entreprise, la Logistique et le Transport (CIRRELT)
Idriss Malek
Idriss Malek
MBZUAI
Thibaut Vidal
Thibaut Vidal
Professor, SCALE-AI Chair, MAGI, Polytechnique Montréal
Combinatorial OptimizationMachine LearningOperations ResearchTransportation and LogisticsExplainable AI