🤖 AI Summary
Real-world image-text data often suffer from class imbalance and label noise, which severely degrade model performance on minority classes. This work presents the first systematic study of active learning under this dual challenge and introduces a collaborative active learning framework that integrates priors from foundation models. By leveraging imbalance-aware collaborative decision-making between a foundation model and a lightweight model, the approach enables efficient and robust sample querying. Extensive experiments on multiple cross-modal imbalanced datasets demonstrate that the proposed method significantly enhances robustness to label noise while reducing annotation costs by over 50% without compromising model performance.
📝 Abstract
Real-world datasets across image and text domains are often characterized by skewed class distributions and noisy annotations, which jointly degrade model performance, particularly on minority classes. Among existing solutions, active learning offers an effective and efficient paradigm by selectively querying the most informative and balanced samples for annotation. We propose an innovative active learning framework that mitigates class imbalance and selects the most informative samples to annotate. Leveraging foundation model priors, our algorithm enables imbalance-aware co-decisions between foundation model and small model to tackle noisy and imbalanced labels across various domains. We introduce the first study to systematically explore active learning under the dual challenges of label noise and class imbalance across image and text domains. Extensive experiments on imbalanced datasets demonstrate that our method achieves substantial annotation savings-over 50% compared to the best active learning baseline-while preserving performance and robustness to label noise.