Learning sparse optimal rule fit by safe screening

📅 2018-10-03

🏛️ arXiv.org

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

This work addresses the challenge of learning optimal sparse linear rule ensembles from an exponential-sized candidate rule set, where each rule is an indicator function over a hyper-rectangle in the input space. To tackle combinatorial explosion inherent in rule enumeration, we propose the first convex optimization framework that integrates safe screening into a tree-structured rule representation: theoretically guaranteed safe pruning efficiently eliminates irrelevant rules, drastically reducing the search space. Our method unifies hierarchical rule enumeration, ℓ₁-regularized convex optimization, and safe screening, enabling scalable yet provably accurate solutions. Experiments on multiple benchmark datasets demonstrate that our approach accelerates training by up to two orders of magnitude while maintaining predictive performance comparable to non-sparse models—thereby overcoming the scalability bottleneck that has long hindered symbolic rule learning.

📝 Abstract

In this paper, we consider linear prediction models in the form of a sparse linear combination of rules, where a rule is an indicator function defined over a hyperrectangle in the input space. Since the number of all possible rules generated from the training dataset becomes extremely large, it has been difficult to consider all of them when fitting a sparse model. In this paper, we propose Safe Optimal Rule Fit (SORF) as an approach to resolve this problem, which is formulated as a convex optimization problem with sparse regularization. The proposed SORF method utilizes the fact that the set of all possible rules can be represented as a tree. By extending a recently popularized convex optimization technique called safe screening, we develop a novel method for pruning the tree such that pruned nodes are guaranteed to be irrelevant to the prediction model. This approach allows us to efficiently learn a prediction model constructed from an exponentially large number of all possible rules. We demonstrate the usefulness of the proposed method by numerical experiments using several benchmark datasets.

Problem

Research questions and friction points this paper is trying to address.

Learning optimal sparse rule models efficiently

Handling large number of possible rules computationally

Extending safe screening to multiple features simultaneously

Innovation

Methods, ideas, or system contributions that make the work stand out.

Meta safe screening for multiple features

Sparse rule model optimization framework

Handles group and sparse regularizations

🔎 Similar Papers

No similar papers found.

Authors to Follow