🤖 AI Summary
This work addresses the challenge that conventional feature selection methods may inadvertently discard critical sensors when models are deployed under covariate shift relative to their development environment, thereby degrading system performance. To mitigate this risk, the paper introduces safe-DRFS, the first safety-aware feature selection framework grounded in distributionally robust learning. By integrating distributionally robust optimization, explicit modeling of covariate shift, and a theoretically guaranteed safe screening mechanism, safe-DRFS ensures—under finite-sample conditions—that no potentially optimal features are excluded (i.e., zero false negatives). The method identifies a “safe” feature set that encompasses all possible optimal subsets, substantially enhancing model robustness and reliability across diverse deployment environments, particularly in multi-user and sparsely sensed industrial settings.
📝 Abstract
In practical machine learning, the environments encountered during the model development and deployment phases often differ, especially when a model is used by many users in diverse settings. Learning models that maintain reliable performance across plausible deployment environments is known as distributionally robust (DR) learning. In this work, we study the problem of distributionally robust feature selection (DRFS), with a particular focus on sparse sensing applications motivated by industrial needs. In practical multi-sensor systems, a shared subset of sensors is typically selected prior to deployment based on performance evaluations using many available sensors. At deployment, individual users may further adapt or fine-tune models to their specific environments. When deployment environments differ from those anticipated during development, this strategy can result in systems lacking sensors required for optimal performance. To address this issue, we propose safe-DRFS, a novel approach that extends safe screening from conventional sparse modeling settings to a DR setting under covariate shift. Our method identifies a feature subset that encompasses all subsets that may become optimal across a specified range of input distribution shifts, with finite-sample theoretical guarantees of no false feature elimination.