🤖 AI Summary
This study addresses the lack of theoretical foundations for sparse support vector machines (SSVMs) by developing a local duality theory. It establishes, for the first time, that SSVM is equivalent to the dual formulation of the 0/1-loss SVM and theoretically connects it to both hinge-loss and ramp-loss SVMs. By modeling SSVM through non-convex optimization with ℓ₀ sparsity constraints, the work introduces a linear representation theorem for local solutions, revealing the intrinsic mechanism by which SSVM local solutions outperform those of conventional SVMs. Theoretical analysis shows that a sequence of global solutions of the hinge-loss SVM converges to a local solution of the 0/1-loss SVM, which is also a local solution of the ramp-loss SVM. Experiments on real-world datasets validate the superior performance of these high-quality local solutions and provide a theoretical basis for hyperparameter selection in SSVM.
📝 Abstract
Due to the rise of cardinality minimization in optimization, sparse support vector machines (SSVMs) have attracted much attention lately and show certain empirical advantages over convex SVMs. A common way to derive an SSVM is to add a cardinality function such as $\ell_0$-norm to the dual problem of a convex SVM. However, this process lacks theoretical justification. This paper fills the gap by developing a local duality theory for such an SSVM formulation and exploring its relationship with the hinge-loss SVM (hSVM) and the ramp-loss SVM (rSVM). In particular, we prove that the derived SSVM is exactly the dual problem of the 0/1-loss SVM, and the linear representer theorem holds for their local solutions. The local solution of SSVM also provides guidelines on selecting hyperparameters of hSVM and rSVM. {Under specific conditions, we show that a sequence of global solutions of hSVM converges to a local solution of 0/1-loss SVM. Moreover, a local minimizer of 0/1-loss SVM is a local minimizer of rSVM.} This explains why a local solution induced by SSVM outperforms hSVM and rSVM in the prior empirical study. We further conduct numerical tests on real datasets and demonstrate potential advantages of SSVM by working with locally nice solutions proposed in this paper.