CGMatch: A Different Perspective of Semi-supervised Learning

📅 2025-03-04
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
In semi-supervised learning (SSL) under extremely low-label regimes (e.g., only 40 labels on CIFAR-10), confidence scores become unreliable, making it difficult to identify high-quality unlabeled samples for pseudo-labeling. Method: This paper proposes a dynamic sample selection mechanism based on Count-Gap—a novel metric quantifying the gap between class-wise prediction counts across multiple augmented views. Integrated with confidence, Count-Gap enables a fine-grained dynamic selection (FDS) strategy that adaptively partitions unlabeled data into “easy”, “ambiguous”, and “hard” subsets, each receiving tailored consistency regularization and pseudo-label refinement. Contribution/Results: The approach effectively suppresses error propagation from erroneous pseudo-labels. It achieves state-of-the-art performance on standard SSL benchmarks, with particularly pronounced gains under extreme label scarcity—establishing a new paradigm for low-resource semi-supervised learning.

Technology Category

Application Category

📝 Abstract
Semi-supervised learning (SSL) has garnered significant attention due to its ability to leverage limited labeled data and a large amount of unlabeled data to improve model generalization performance. Recent approaches achieve impressive successes by combining ideas from both consistency regularization and pseudo-labeling. However, these methods tend to underperform in the more realistic situations with relatively scarce labeled data. We argue that this issue arises because existing methods rely solely on the model's confidence, making them challenging to accurately assess the model's state and identify unlabeled examples contributing to the training phase when supervision information is limited, especially during the early stages of model training. In this paper, we propose a novel SSL model called CGMatch, which, for the first time, incorporates a new metric known as Count-Gap (CG). We demonstrate that CG is effective in discovering unlabeled examples beneficial for model training. Along with confidence, a commonly used metric in SSL, we propose a fine-grained dynamic selection (FDS) strategy. This strategy dynamically divides the unlabeled dataset into three subsets with different characteristics: easy-to-learn set, ambiguous set, and hard-to-learn set. By selective filtering subsets, and applying corresponding regularization with selected subsets, we mitigate the negative impact of incorrect pseudo-labels on model optimization and generalization. Extensive experimental results on several common SSL benchmarks indicate the effectiveness of CGMatch especially when the labeled data are particularly limited. Source code is available at https://github.com/BoCheng-96/CGMatch.
Problem

Research questions and friction points this paper is trying to address.

Improves semi-supervised learning with limited labeled data.
Introduces Count-Gap metric to identify beneficial unlabeled examples.
Proposes dynamic selection strategy to enhance model generalization.
Innovation

Methods, ideas, or system contributions that make the work stand out.

Introduces Count-Gap metric for SSL
Dynamic selection strategy divides unlabeled data
Selective filtering improves model generalization
🔎 Similar Papers
No similar papers found.
B
Bo Cheng
School of Artificial Intelligence, Jilin University, China; International Center of Future Science, Jilin University, China; Engineering Research Center of Knowledge-Driven Human-Machine Intelligence, MOE, China
Jueqing Lu
Jueqing Lu
Monash University
Machine Learning
Y
Yuan Tian
School of Artificial Intelligence, Jilin University, China; Engineering Research Center of Knowledge-Driven Human-Machine Intelligence, MOE, China
H
Haifeng Zhao
Department of Computer Science, Jinling Institute of Technology, China
Y
Yi Chang
School of Artificial Intelligence, Jilin University, China; International Center of Future Science, Jilin University, China; Engineering Research Center of Knowledge-Driven Human-Machine Intelligence, MOE, China
L
Lan Du
Faculty of Information Technology, Monash University, Australia