๐ค AI Summary
Traditional early stopping operates uniformly at the global epoch level, ignoring inter-sample variations in learning progressโcausing already-mastered samples to persistently undergo backpropagation and incur computational redundancy. This paper proposes an instance-level adaptive early stopping method, the first to refine stopping granularity to individual training samples. Leveraging second-order loss differences, we formulate a stable mastery criterion and dynamically determine, under a unified threshold, whether each sample has been sufficiently learned. Gradient masking is then applied to selectively prune backpropagation for mastered samples. The approach requires no auxiliary models or external supervision, significantly reducing computational overhead: it decreases the number of backpropagated samples by 10%โ50% across multiple benchmark datasets, yielding substantial training speedup while preserving or marginally improving test accuracy and transfer performance.
๐ Abstract
In machine learning practice, early stopping has been widely used to regularize models and can save computational costs by halting the training process when the model's performance on a validation set stops improving. However, conventional early stopping applies the same stopping criterion to all instances without considering their individual learning statuses, which leads to redundant computations on instances that are already well-learned. To further improve the efficiency, we propose an Instance-dependent Early Stopping (IES) method that adapts the early stopping mechanism from the entire training set to the instance level, based on the core principle that once the model has mastered an instance, the training on it should stop. IES considers an instance as mastered if the second-order differences of its loss value remain within a small range around zero. This offers a more consistent measure of an instance's learning status compared with directly using the loss value, and thus allows for a unified threshold to determine when an instance can be excluded from further backpropagation. We show that excluding mastered instances from backpropagation can increase the gradient norms, thereby accelerating the decrease of the training loss and speeding up the training process. Extensive experiments on benchmarks demonstrate that IES method can reduce backpropagation instances by 10%-50% while maintaining or even slightly improving the test accuracy and transfer learning performance of a model.