Learning from M-Tuple Dominant Positive and Unlabeled Data

📅 2025-05-25

🏛️ arXiv.org

📈 Citations: 0

✨ Influential: 0

career value

160K/year

🤖 AI Summary

In label-proportion learning (LLP), a key practical bottleneck is the difficulty of obtaining precise class proportions within bags. To address this, we propose MDPU—a novel framework that introduces the M-tuple dominant positive assumption (i.e., each bag contains at least as many positive as negative instances), enabling the construction of an unbiased, risk-consistent empirical risk estimator. We further design a risk correction mechanism to mitigate overfitting and theoretically derive a generalization error upper bound to guarantee consistency. Extensive experiments on multiple benchmark datasets demonstrate that MDPU significantly outperforms state-of-the-art LLP and PU learning methods in terms of accuracy, robustness, and real-world applicability.

Technology Category

Application Category

📝 Abstract

Label Proportion Learning (LLP) addresses the classification problem where multiple instances are grouped into bags and each bag contains information about the proportion of each class. However, in practical applications, obtaining precise supervisory information regarding the proportion of instances in a specific class is challenging. To better align with real-world application scenarios and effectively leverage the proportional constraints of instances within tuples, this paper proposes a generalized learning framework emph{MDPU}. Specifically, we first mathematically model the distribution of instances within tuples of arbitrary size, under the constraint that the number of positive instances is no less than that of negative instances. Then we derive an unbiased risk estimator that satisfies risk consistency based on the empirical risk minimization (ERM) method. To mitigate the inevitable overfitting issue during training, a risk correction method is introduced, leading to the development of a corrected risk estimator. The generalization error bounds of the unbiased risk estimator theoretically demonstrate the consistency of the proposed method. Extensive experiments on multiple datasets and comparisons with other relevant baseline methods comprehensively validate the effectiveness of the proposed learning framework.

Problem

Research questions and friction points this paper is trying to address.

Classifying instances in bags with imprecise label proportions

Modeling instance distribution under tuple constraints

Developing unbiased risk estimator for LLP

Innovation

Methods, ideas, or system contributions that make the work stand out.

Generalized MDPU framework for tuple-based learning

Unbiased risk estimator with empirical minimization

Risk correction method to prevent overfitting

🔎 Similar Papers

No similar papers found.