π€ AI Summary
To address label noise and ambiguity in crowdsourcing and related settings, this paper investigates partial-label learning (PLL), where each instance is associated with a candidate label set containing exactly one ground-truth label. To mitigate label ambiguity, we propose a probabilistic framework based on amortized variational inference, enabling efficient end-to-end approximation of the true label posterior distribution. Our approach integrates the representational power of deep neural networks with the statistical rigor of Bayesian modeling: a neural network parameterizes the variational distribution, while a maximum mutual information criterion jointly guides label disambiguation and model optimization. The method is architecture-agnostic and highly flexible for deployment. Extensive experiments on multiple synthetic and real-world benchmarks demonstrate substantial improvements in classification accuracy and inference efficiency, achieving state-of-the-art performance. These results validate the methodβs effectiveness, robustness to label noise, and scalability across diverse domains.
π Abstract
Real-world data is frequently noisy and ambiguous. In crowdsourcing, for example, human annotators may assign conflicting class labels to the same instances. Partial-label learning (PLL) addresses this challenge by training classifiers when each instance is associated with a set of candidate labels, only one of which is correct. While early PLL methods approximate the true label posterior, they are often computationally intensive. Recent deep learning approaches improve scalability but rely on surrogate losses and heuristic label refinement. We introduce a novel probabilistic framework that directly approximates the posterior distribution over true labels using amortized variational inference. Our method employs neural networks to predict variational parameters from input data, enabling efficient inference. This approach combines the expressiveness of deep learning with the rigor of probabilistic modeling, while remaining architecture-agnostic. Theoretical analysis and extensive experiments on synthetic and real-world datasets demonstrate that our method achieves state-of-the-art performance in both accuracy and efficiency.