Model positive and unlabeled data with a generalized additive density ratio model

📅 2025-08-17
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This paper addresses the positive-unlabeled (PU) learning problem, where unlabeled data consist of a mixture of true negatives and latent positives, and conventional linear models fail under nonlinear relationships. To overcome this, we propose a generalized additive density-ratio estimation framework that extends beyond the linear constraints of the exponential tilt model by incorporating additive nonlinear components—ensuring identifiability while balancing flexibility and interpretability. The framework supports mixture proportion estimation, classification prediction, and uncertainty quantification. Theoretically, we establish asymptotic properties and foundations for statistical inference. Algorithmically, it integrates generalized additive modeling with density-ratio optimization. Experiments demonstrate competitive performance with classical methods in linear settings and substantial gains in accuracy and robustness under nonlinear scenarios, thereby unifying expressive modeling capability with statistical rigor.

Technology Category

Application Category

📝 Abstract
We address learning from positive and unlabeled (PU) data, a common setting in which only some positives are labeled and the rest are mixed with negatives. Classical exponential tilting models guarantee identifiability by assuming a linear structure, but they can be badly misspecified when relationships are nonlinear. We propose a generalized additive density-ratio framework that retains identifiability while allowing smooth, feature-specific effects. The approach comes with a practical fitting algorithm and supporting theory that enables estimation and inference for the mixture proportion and other quantities of interest. In simulations and analyses of benchmark datasets, the proposed method matches the standard exponential tilting method when the linear model is correct and delivers clear gains when it is not. Overall, the framework strikes a useful balance between flexibility and interpretability for PU learning and provides principled tools for estimation, prediction, and uncertainty assessment.
Problem

Research questions and friction points this paper is trying to address.

Learning from positive and unlabeled data with nonlinear relationships
Ensuring identifiability while allowing flexible feature-specific effects
Providing estimation and inference for mixture proportion and other quantities
Innovation

Methods, ideas, or system contributions that make the work stand out.

Generalized additive density-ratio framework
Feature-specific smooth nonlinear effects
Practical fitting algorithm with theory
🔎 Similar Papers
No similar papers found.
P
Peijun Sang
Department of Statistics and Actuarial Science, University of Waterloo
Y
Yifan Sun
Department of Statistics and Actuarial Science, University of Waterloo
Qinglong Tian
Qinglong Tian
University of Waterloo
statistics
Donglin Zeng
Donglin Zeng
Professor of Biostatistics, University of Michigan
statisticsbiostatisticsprecision medicinemachine learningsemiparametric models
P
Pengfei Li
Department of Statistics and Actuarial Science, University of Waterloo