🤖 AI Summary
To address the fundamental limitation of gradient-boosted decision trees (GBDTs)—their inability to dynamically incorporate or remove training instances, rendering them unsuitable for streaming settings—this paper proposes the first online GBDT framework supporting *in-place* incremental and decremental learning. Methodologically, we introduce a unified in-place tree editing mechanism, gradient reweighting, node caching, and a theoretical hyperparameter relationship model to enable tunable, joint optimization of accuracy and computational cost. Additionally, the framework supports dynamic injection and removal of backdoor triggers, with formal verification of mitigation efficacy. Experiments on public benchmarks demonstrate millisecond-scale update latency and <1% accuracy degradation. Notably, this work achieves, for the first time in GBDTs, secure and controllable dynamic management of backdoored models—significantly enhancing practicality and robustness in dynamic environments.
📝 Abstract
Gradient Boosting Decision Tree (GBDT) is one of the most popular machine learning models in various applications. However, in the traditional settings, all data should be simultaneously accessed in the training procedure: it does not allow to add or delete any data instances after training. In this paper, we propose an efficient online learning framework for GBDT supporting both incremental and decremental learning. To the best of our knowledge, this is the first work that considers an in-place unified incremental and decremental learning on GBDT. To reduce the learning cost, we present a collection of optimizations for our framework, so that it can add or delete a small fraction of data on the fly. We theoretically show the relationship between the hyper-parameters of the proposed optimizations, which enables trading off accuracy and cost on incremental and decremental learning. The backdoor attack results show that our framework can successfully inject and remove backdoor in a well-trained model using incremental and decremental learning, and the empirical results on public datasets confirm the effectiveness and efficiency of our proposed online learning framework and optimizations.