🤖 AI Summary
Existing n-tuple learning methods lack a unified theoretical foundation and rely heavily on task-specific designs. Method: This paper proposes the first general weakly supervised n-tuple learning framework grounded in empirical risk minimization (ERM). It unifies n-tuple comparisons and pointwise unlabeled data generation within a probabilistic modeling paradigm, deriving unbiased risk estimators applicable to diverse model classes and establishing a generalization error bound. A correction function is introduced to rectify the negative-risk term, enabling flexible instantiation across four canonical weak supervision settings. Results: Extensive experiments on multiple benchmark datasets demonstrate that the proposed method significantly improves generalization performance in n-tuple learning—particularly when integrating pointwise unlabeled data—outperforming existing approaches consistently.
📝 Abstract
To alleviate the annotation burden in supervised learning, N-tuples learning has recently emerged as a powerful weakly-supervised method. While existing N-tuples learning approaches extend pairwise learning to higher-order comparisons and accommodate various real-world scenarios, they often rely on task-specific designs and lack a unified theoretical foundation. In this paper, we propose a general N-tuples learning framework based on empirical risk minimization, which systematically integrates pointwise unlabeled data to enhance learning performance. This paper first unifies the data generation processes of N-tuples and pointwise unlabeled data under a shared probabilistic formulation. Based on this unified view, we derive an unbiased empirical risk estimator that generalizes a broad class of existing N-tuples models. We further establish a generalization error bound for theoretical support. To demonstrate the flexibility of the framework, we instantiate it in four representative weakly supervised scenarios, each recoverable as a special case of our general model. Additionally, to address overfitting issues arising from negative risk terms, we adopt correction functions to adjust the empirical risk. Extensive experiments on benchmark datasets validate the effectiveness of the proposed framework and demonstrate that leveraging pointwise unlabeled data consistently improves generalization across various N-tuples learning tasks.