🤖 AI Summary
Group Ordered Weighted L1 (OWL) regularization in high-dimensional sparse multi-task learning suffers from prohibitive computational cost and excessive memory consumption. Method: This paper introduces, for the first time, safe feature screening rules tailored to structured, non-separable Group OWL penalties—rigorously identifying and eliminating inactive features whose coefficients vanish identically across all tasks. The rules are derived via duality gap analysis and convex optimization theory, ensuring theoretical safety and compatibility with both batch and stochastic optimization frameworks; they integrate seamlessly into proximal gradient solvers. Contribution/Results: Experiments demonstrate that the proposed method achieves substantial speedup and memory reduction without any loss in solution accuracy, establishing the first efficient and provably safe pre-screening tool for large-scale Group OWL optimization.
📝 Abstract
Group Ordered Weighted $L_{1}$-Norm (Group OWL) regularized models have emerged as a useful procedure for high-dimensional sparse multi-task learning with correlated features. Proximal gradient methods are used as standard approaches to solving Group OWL models. However, Group OWL models usually suffer huge computational costs and memory usage when the feature size is large in the high-dimensional scenario. To address this challenge, in this paper, we are the first to propose the safe screening rule for Group OWL models by effectively tackling the structured non-separable penalty, which can quickly identify the inactive features that have zero coefficients across all the tasks. Thus, by removing the inactive features during the training process, we may achieve substantial computational gain and memory savings. More importantly, the proposed screening rule can be directly integrated with the existing solvers both in the batch and stochastic settings. Theoretically, we prove our screening rule is safe and also can be safely applied to the existing iterative optimization algorithms. Our experimental results demonstrate that our screening rule can effectively identify the inactive features and leads to a significant computational speedup without any loss of accuracy.