🤖 AI Summary
This work addresses the problem of active positive-unlabeled (PU) learning under a weakly supervised setting where only a subset of positive instances are labeled, and the rest of the data are unlabeled. The learner can query unlabeled samples, but receives label feedback only if the queried instance is truly positive and an independent random trial succeeds—a probabilistic labeling mechanism. For this setting, the paper presents the first theoretical analysis of label complexity in active PU learning, establishing both upper and lower bounds and thereby filling a critical gap in the theoretical understanding of this paradigm. By integrating an active querying strategy with a PU learning model, the proposed approach significantly improves query efficiency, offering both theoretical guarantees and practical algorithmic guidance for real-world applications such as online advertising and anomaly detection.
📝 Abstract
Learning from positive and unlabeled data (PU learning) is a weakly supervised variant of binary classification in which the learner receives labels only for (some) positively labeled instances, while all other examples remain unlabeled. Motivated by applications such as advertising and anomaly detection, we study an active PU learning setting where the learner can adaptively query instances from an unlabeled pool, but a queried label is revealed only when the instance is positive and an independent coin flip succeeds; otherwise the learner receives no information. In this paper, we provide the first theoretical analysis of the label complexity of active PU learning.