Free Lunch in the Forest: Functionally-Identical Pruning of Boosted Tree Ensembles

📅 2024-08-28

🏛️ arXiv.org

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

Tree ensemble models suffer from large memory footprint, slow inference, and poor interpretability. Method: This paper proposes a functionally equivalent pruning technique that compresses models without any accuracy loss—strictly preserving the original model’s output for all inputs. We formally define and realize function-level equivalence pruning, and develop an exact optimization framework based on iterative adversarial example augmentation, integrating combinatorial optimization modeling, structural equivalence analysis of decision trees, and a greedy pruning strategy. Contribution/Results: Experiments across multiple benchmark datasets demonstrate compression rates exceeding 90%, while fully retaining the original model’s predictive behavior and all aggregated performance metrics. The pruned models significantly improve deployment efficiency and enhance model interpretability, offering a rigorous, zero-loss compression solution for tree ensembles.

Technology Category

Application Category

📝 Abstract

Tree ensembles, including boosting methods, are highly effective and widely used for tabular data. However, large ensembles lack interpretability and require longer inference times. We introduce a method to prune a tree ensemble into a reduced version that is"functionally identical"to the original model. In other words, our method guarantees that the prediction function stays unchanged for any possible input. As a consequence, this pruning algorithm is lossless for any aggregated metric. We formalize the problem of functionally identical pruning on ensembles, introduce an exact optimization model, and provide a fast yet highly effective method to prune large ensembles. Our algorithm iteratively prunes considering a finite set of points, which is incrementally augmented using an adversarial model. In multiple computational experiments, we show that our approach is a"free lunch", significantly reducing the ensemble size without altering the model's behavior. Thus, we can preserve state-of-the-art performance at a fraction of the original model's size.

Problem

Research questions and friction points this paper is trying to address.

Large Tree Ensembles

Interpretability

Computational Efficiency

Innovation

Methods, ideas, or system contributions that make the work stand out.

Model Pruning

Complexity Reduction

Performance Preservation

🔎 Similar Papers

No similar papers found.

Authors to Follow