Multi-Label Bayesian Active Learning with Inter-Label Relationships

📅 2024-11-26

🏛️ arXiv.org

📈 Citations: 0

✨ Influential: 0

career value

196K/year

🤖 AI Summary

Multi-label active learning faces dual challenges: modeling label correlations and addressing data imbalance. Existing approaches either incur prohibitive computational overhead, neglect dependency structures among labels, or fail to mitigate long-tail bias effectively. This paper proposes the first Bayesian active learning framework integrating a dynamic bidirectional label correlation matrix, ensemble-based pseudo-labeling, and Beta-calibrated uncertainty scoring. Innovatively, it employs progressively updated positive and negative correlation matrices to separately capture label co-occurrence and mutual exclusivity. By jointly optimizing diversity-aware sampling, pseudo-label guidance, and Beta-distribution-driven uncertainty calibration, the framework synergistically alleviates class bias. Extensive experiments on four real-world multi-label datasets demonstrate significant improvements over state-of-the-art methods: labeling efficiency increases by up to 23.6%, average F1-score for long-tail labels improves by 18.4%, and both generalization capability and robustness are substantially enhanced.

Technology Category

Application Category

📝 Abstract

The primary challenge of multi-label active learning, differing it from multi-class active learning, lies in assessing the informativeness of an indefinite number of labels while also accounting for the inherited label correlation. Existing studies either require substantial computational resources to leverage correlations or fail to fully explore label dependencies. Additionally, real-world scenarios often require addressing intrinsic biases stemming from imbalanced data distributions. In this paper, we propose a new multi-label active learning strategy to address both challenges. Our method incorporates progressively updated positive and negative correlation matrices to capture co-occurrence and disjoint relationships within the label space of annotated samples, enabling a holistic assessment of uncertainty rather than treating labels as isolated elements. Furthermore, alongside diversity, our model employs ensemble pseudo labeling and beta scoring rules to address data imbalances. Extensive experiments on four realistic datasets demonstrate that our strategy consistently achieves more reliable and superior performance, compared to several established methods.

Problem

Research questions and friction points this paper is trying to address.

Assess informativeness of multiple labels with correlations

Address computational challenges in leveraging label dependencies

Mitigate data imbalance biases in multi-label learning

Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses updated correlation matrices for label relationships

Employs ensemble pseudo labeling for data imbalances

Integrates beta scoring rules for uncertainty assessment

🔎 Similar Papers

No similar papers found.