Learning from M-Tuple Dominant Positive and Unlabeled Data

📅 2025-05-25
🏛️ arXiv.org
📈 Citations: 0
Influential: 0
📄 PDF

career value

160K/year
🤖 AI Summary
In label-proportion learning (LLP), a key practical bottleneck is the difficulty of obtaining precise class proportions within bags. To address this, we propose MDPU—a novel framework that introduces the M-tuple dominant positive assumption (i.e., each bag contains at least as many positive as negative instances), enabling the construction of an unbiased, risk-consistent empirical risk estimator. We further design a risk correction mechanism to mitigate overfitting and theoretically derive a generalization error upper bound to guarantee consistency. Extensive experiments on multiple benchmark datasets demonstrate that MDPU significantly outperforms state-of-the-art LLP and PU learning methods in terms of accuracy, robustness, and real-world applicability.

Technology Category

Application Category

📝 Abstract
Label Proportion Learning (LLP) addresses the classification problem where multiple instances are grouped into bags and each bag contains information about the proportion of each class. However, in practical applications, obtaining precise supervisory information regarding the proportion of instances in a specific class is challenging. To better align with real-world application scenarios and effectively leverage the proportional constraints of instances within tuples, this paper proposes a generalized learning framework emph{MDPU}. Specifically, we first mathematically model the distribution of instances within tuples of arbitrary size, under the constraint that the number of positive instances is no less than that of negative instances. Then we derive an unbiased risk estimator that satisfies risk consistency based on the empirical risk minimization (ERM) method. To mitigate the inevitable overfitting issue during training, a risk correction method is introduced, leading to the development of a corrected risk estimator. The generalization error bounds of the unbiased risk estimator theoretically demonstrate the consistency of the proposed method. Extensive experiments on multiple datasets and comparisons with other relevant baseline methods comprehensively validate the effectiveness of the proposed learning framework.
Problem

Research questions and friction points this paper is trying to address.

Classifying instances in bags with imprecise label proportions
Modeling instance distribution under tuple constraints
Developing unbiased risk estimator for LLP
Innovation

Methods, ideas, or system contributions that make the work stand out.

Generalized MDPU framework for tuple-based learning
Unbiased risk estimator with empirical minimization
Risk correction method to prevent overfitting
🔎 Similar Papers
No similar papers found.
J
Jiahe Qin
Engineering Research Center of the Ministry of Education for Intelligent Control System and Intelligent Equipment, Yanshan University, Qinhuangdao, China
J
Junpeng Li
Engineering Research Center of the Ministry of Education for Intelligent Control System and Intelligent Equipment, Yanshan University, Qinhuangdao, China
Changchun Hua
Changchun Hua
Yanshan University
control and systemsnetworked systemsteleoperation systems
Y
Yana Yang
Engineering Research Center of the Ministry of Education for Intelligent Control System and Intelligent Equipment, Yanshan University, Qinhuangdao, China