🤖 AI Summary
Computing the Rashomon set of sparse decision trees—i.e., the collection of near-optimal models with comparable performance—is notoriously challenging due to prohibitive memory and computational costs, hindering scalability on real-world datasets. This work proposes PRAXIS, the first algorithm enabling scalable approximate computation of decision tree Rashomon sets. By integrating pruning techniques with optimized search strategies, PRAXIS efficiently enumerates the space of near-optimal sparse models, achieving high coverage while drastically reducing resource consumption. Empirical evaluations demonstrate that PRAXIS recovers nearly the entire original Rashomon set on multiple real datasets, operating orders of magnitude faster and with substantially lower memory usage than prior approaches. This advancement empowers practitioners to explore diverse, high-performing models and incorporate domain knowledge in practical settings.
📝 Abstract
Standard machine learning pipelines often admit many near-optimal models. These "Rashomon sets" pose a range of challenges and opportunities for uncertainty-aware, robust decision making. They allow users to incorporate domain knowledge and preferences that would otherwise be difficult to specify directly in an objective, and they quantify diversity among valid models for a given training dataset and objective function. However, computation of Rashomon sets, even for simple, interpretable model classes such as sparse decision trees, continues to require immense memory and runtime resources. We present PRAXIS, an algorithm to approximate this Rashomon set with orders of magnitude improvement in runtime and memory usage. We validate that PRAXIS regularly recovers almost all of the full Rashomon set. PRAXIS allows researchers and practitioners to scalably model the Rashomon set for real-world datasets. Code for PRAXIS is available at https://github.com/zakk-h/PRAXIS