🤖 AI Summary
To address the underutilization of unlabeled data—particularly out-of-distribution (OOD) samples—under label scarcity, this paper proposes Classifier-Noise-Invariant CGAN (CNI-CGAN), the first method enabling robust learning of the true data distribution from noisy pseudo-labels generated by a positive-unlabeled (PU) classifier. Methodologically, CNI-CGAN establishes a bidirectional, jointly optimized framework integrating classification and conditional generation: the PU classifier provides pseudo-supervision, while the CGAN refines the classification decision boundary in a noise-robust manner. We theoretically prove its optimality and formalize a mutually beneficial training paradigm between classification and generation. Extensive experiments on multiple benchmark datasets demonstrate that CNI-CGAN simultaneously improves both PU classification accuracy and conditional generation fidelity, significantly outperforming single-task baselines as well as state-of-the-art PU learning and semi-supervised generative methods.
📝 Abstract
The scarcity of class-labeled data is a ubiquitous bottleneck in a wide range of machine learning problems. While abundant unlabeled data normally exist and provide a potential solution, it is extremely challenging to exploit them. In this paper, we address this problem by leveraging Positive-Unlabeled~(PU) classification and conditional generation with extra unlabeled data emph{simultaneously}, both of which aim to make full use of agnostic unlabeled data to improve classification and generation performances. In particular, we present a novel training framework to jointly target both PU classification and conditional generation when exposing to extra data, especially out-of-distribution unlabeled data, by exploring the interplay between them: 1) enhancing the performance of PU classifiers with the assistance of a novel Conditional Generative Adversarial Network~(CGAN) that is robust to noisy labels, 2) leveraging extra data with predicted labels from a PU classifier to help the generation. Our key contribution is a Classifier-Noise-Invariant Conditional GAN~(CNI-CGAN) that can learn the clean data distribution from noisy labels predicted by a PU classifier. Theoretically, we proved the optimal condition of CNI-CGAN and experimentally, we conducted extensive evaluations on diverse datasets, verifying the simultaneous improvements on both classification and generation.