🤖 AI Summary
Robust loss functions for label-noise scenarios often require extensive dataset-specific hyperparameter tuning, hindering practical deployment.
Method: This paper proposes the Fractional Classification Loss (FCL), the first loss function incorporating fractional-order derivatives into classification loss design. FCL establishes an active-passive loss framework wherein the fractional-order parameter μ is treated as a learnable variable, enabling adaptive balancing between robustness and convergence speed. Specifically, FCL fuses the fractional-order derivative of cross-entropy with mean absolute error (MAE) and jointly optimizes both model parameters and μ in an end-to-end manner.
Contribution/Results: On multiple benchmark datasets under label noise, FCL achieves state-of-the-art (SOTA) performance without manual hyperparameter tuning, demonstrating superior convergence stability and high classification accuracy.
📝 Abstract
Robust loss functions are crucial for training deep neural networks in the presence of label noise, yet existing approaches require extensive, dataset-specific hyperparameter tuning. In this work, we introduce Fractional Classification Loss (FCL), an adaptive robust loss that automatically calibrates its robustness to label noise during training. Built within the active-passive loss framework, FCL employs the fractional derivative of the Cross-Entropy (CE) loss as its active component and the Mean Absolute Error (MAE) as its passive loss component. With this formulation, we demonstrate that the fractional derivative order $μ$ spans a family of loss functions that interpolate between MAE-like robustness and CE-like fast convergence. Furthermore, we integrate $μ$ into the gradient-based optimization as a learnable parameter and automatically adjust it to optimize the trade-off between robustness and convergence speed. We reveal that FCL's unique property establishes a critical trade-off that enables the stable learning of $μ$: lower log penalties on difficult or mislabeled examples improve robustness but impose higher penalties on easy or clean data, reducing model confidence in them. Consequently, FCL can dynamically reshape its loss landscape to achieve effective classification performance under label noise. Extensive experiments on benchmark datasets show that FCL achieves state-of-the-art results without the need for manual hyperparameter tuning.