Model positive and unlabeled data with a generalized additive density ratio model

📅 2025-08-17

📈 Citations: 0

✨ Influential: 0

career value

215K/year

🤖 AI Summary

This paper addresses the positive-unlabeled (PU) learning problem, where unlabeled data consist of a mixture of true negatives and latent positives, and conventional linear models fail under nonlinear relationships. To overcome this, we propose a generalized additive density-ratio estimation framework that extends beyond the linear constraints of the exponential tilt model by incorporating additive nonlinear components—ensuring identifiability while balancing flexibility and interpretability. The framework supports mixture proportion estimation, classification prediction, and uncertainty quantification. Theoretically, we establish asymptotic properties and foundations for statistical inference. Algorithmically, it integrates generalized additive modeling with density-ratio optimization. Experiments demonstrate competitive performance with classical methods in linear settings and substantial gains in accuracy and robustness under nonlinear scenarios, thereby unifying expressive modeling capability with statistical rigor.

Technology Category

Application Category

📝 Abstract

We address learning from positive and unlabeled (PU) data, a common setting in which only some positives are labeled and the rest are mixed with negatives. Classical exponential tilting models guarantee identifiability by assuming a linear structure, but they can be badly misspecified when relationships are nonlinear. We propose a generalized additive density-ratio framework that retains identifiability while allowing smooth, feature-specific effects. The approach comes with a practical fitting algorithm and supporting theory that enables estimation and inference for the mixture proportion and other quantities of interest. In simulations and analyses of benchmark datasets, the proposed method matches the standard exponential tilting method when the linear model is correct and delivers clear gains when it is not. Overall, the framework strikes a useful balance between flexibility and interpretability for PU learning and provides principled tools for estimation, prediction, and uncertainty assessment.

Problem

Research questions and friction points this paper is trying to address.

Learning from positive and unlabeled data with nonlinear relationships

Ensuring identifiability while allowing flexible feature-specific effects

Providing estimation and inference for mixture proportion and other quantities

Innovation

Methods, ideas, or system contributions that make the work stand out.

Generalized additive density-ratio framework

Feature-specific smooth nonlinear effects

Practical fitting algorithm with theory

🔎 Similar Papers

No similar papers found.