Learning sparse optimal rule fit by safe screening

πŸ“… 2018-10-03
πŸ›οΈ arXiv.org
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
This work addresses the challenge of learning optimal sparse linear rule ensembles from an exponential-sized candidate rule set, where each rule is an indicator function over a hyper-rectangle in the input space. To tackle combinatorial explosion inherent in rule enumeration, we propose the first convex optimization framework that integrates safe screening into a tree-structured rule representation: theoretically guaranteed safe pruning efficiently eliminates irrelevant rules, drastically reducing the search space. Our method unifies hierarchical rule enumeration, ℓ₁-regularized convex optimization, and safe screening, enabling scalable yet provably accurate solutions. Experiments on multiple benchmark datasets demonstrate that our approach accelerates training by up to two orders of magnitude while maintaining predictive performance comparable to non-sparse modelsβ€”thereby overcoming the scalability bottleneck that has long hindered symbolic rule learning.
πŸ“ Abstract
In this paper, we consider linear prediction models in the form of a sparse linear combination of rules, where a rule is an indicator function defined over a hyperrectangle in the input space. Since the number of all possible rules generated from the training dataset becomes extremely large, it has been difficult to consider all of them when fitting a sparse model. In this paper, we propose Safe Optimal Rule Fit (SORF) as an approach to resolve this problem, which is formulated as a convex optimization problem with sparse regularization. The proposed SORF method utilizes the fact that the set of all possible rules can be represented as a tree. By extending a recently popularized convex optimization technique called safe screening, we develop a novel method for pruning the tree such that pruned nodes are guaranteed to be irrelevant to the prediction model. This approach allows us to efficiently learn a prediction model constructed from an exponentially large number of all possible rules. We demonstrate the usefulness of the proposed method by numerical experiments using several benchmark datasets.
Problem

Research questions and friction points this paper is trying to address.

Learning optimal sparse rule models efficiently
Handling large number of possible rules computationally
Extending safe screening to multiple features simultaneously
Innovation

Methods, ideas, or system contributions that make the work stand out.

Meta safe screening for multiple features
Sparse rule model optimization framework
Handles group and sparse regularizations
πŸ”Ž Similar Papers
No similar papers found.
H
Hiroki Kato
Department of Computer Science, Nagoya Institute of Technology, Nagoya, Aichi, 466-8555, Japan; Center for Advanced Intelligence Project, RIKEN, Chuo, Tokyo, 103-0027, Japan
Hiroyuki Hanada
Hiroyuki Hanada
Nagoya University
machine learningmathematical optimization
I
Ichiro Takeuchi
Department of Computer Science, Nagoya Institute of Technology, Nagoya, Aichi, 466-8555, Japan; Center for Materials Research by Information Integration, National Institute for Materials Science, Tsukuba, Ibaraki, 305-0047, Japan