🤖 AI Summary
This paper addresses the positive-unlabeled (PU) learning problem, where unlabeled data consist of a mixture of true negatives and latent positives, and conventional linear models fail under nonlinear relationships. To overcome this, we propose a generalized additive density-ratio estimation framework that extends beyond the linear constraints of the exponential tilt model by incorporating additive nonlinear components—ensuring identifiability while balancing flexibility and interpretability. The framework supports mixture proportion estimation, classification prediction, and uncertainty quantification. Theoretically, we establish asymptotic properties and foundations for statistical inference. Algorithmically, it integrates generalized additive modeling with density-ratio optimization. Experiments demonstrate competitive performance with classical methods in linear settings and substantial gains in accuracy and robustness under nonlinear scenarios, thereby unifying expressive modeling capability with statistical rigor.
📝 Abstract
We address learning from positive and unlabeled (PU) data, a common setting in which only some positives are labeled and the rest are mixed with negatives. Classical exponential tilting models guarantee identifiability by assuming a linear structure, but they can be badly misspecified when relationships are nonlinear. We propose a generalized additive density-ratio framework that retains identifiability while allowing smooth, feature-specific effects. The approach comes with a practical fitting algorithm and supporting theory that enables estimation and inference for the mixture proportion and other quantities of interest. In simulations and analyses of benchmark datasets, the proposed method matches the standard exponential tilting method when the linear model is correct and delivers clear gains when it is not. Overall, the framework strikes a useful balance between flexibility and interpretability for PU learning and provides principled tools for estimation, prediction, and uncertainty assessment.