🤖 AI Summary
This paper studies conditional classification under Gaussian distributions with halfspace selection rules: given a data subset selected by a homogeneous halfspace, learn a sparse linear classifier whose error on this subset approximates the optimal error. We present the first PAC learning algorithm for this setting, achieving a provable error bound of $O^*(sqrt{mathrm{opt}})$. We rigorously establish that the computational hardness of this problem is equivalent to agnostic learning—under standard cryptographic assumptions—thereby characterizing both the learnability boundary and computational lower bound for conditional classification. Our key contributions are: (1) the first efficient PAC algorithm tailored to halfspace selection mechanisms; (2) an optimality characterization showing the $sqrt{mathrm{opt}}$-approximation error is tight; and (3) the first proof of computational equivalence between conditional classification and classical agnostic learning.
📝 Abstract
We study ``selective'' or ``conditional'' classification problems under an agnostic setting. Classification tasks commonly focus on modeling the relationship between features and categories that captures the vast majority of data. In contrast to common machine learning frameworks, conditional classification intends to model such relationships only on a subset of the data defined by some selection rule. Most work on conditional classification either solves the problem in a realizable setting or does not guarantee the error is bounded compared to an optimal solution. In this work, we consider selective/conditional classification by sparse linear classifiers for subsets defined by halfspaces, and give both positive as well as negative results for Gaussian feature distributions. On the positive side, we present the first PAC-learning algorithm for homogeneous halfspace selectors with error guarantee $igO*{sqrt{mathrm{opt}}}$, where $mathrm{opt}$ is the smallest conditional classification error over the given class of classifiers and homogeneous halfspaces. On the negative side, we find that, under cryptographic assumptions, approximating the conditional classification loss within a small additive error is computationally hard even under Gaussian distribution. We prove that approximating conditional classification is at least as hard as approximating agnostic classification in both additive and multiplicative form.