π€ AI Summary
This study addresses the challenges of class imbalance and interference from semantically similar labels in Chinese scientific document classification. The authors propose a fine-tuning approach that exclusively optimizes the training objective and procedure, without altering the modelβs inference architecture. Their method uniquely integrates automatic gated tail-prior adjustment, weakly balanced Softmax auxiliary loss, and fast gradient adversarial regularization to enhance the modelβs sensitivity to long-tailed classes. Evaluated on two benchmarks using Chinese RoBERTa-WWM and MacBERT-base encoders, the approach achieves a 0.83% improvement in accuracy and a 0.49% gain in lockbox accuracy on a 67-class abstract classification task, and up to a 2.64% increase in balanced accuracy on a 13-class title classification task, demonstrating consistently significant performance gains.
π Abstract
Scholarly text classification supports literature organization, subject indexing, and research intelligence, but Chinese scholarly corpora often contain imbalanced and semantically adjacent disciplinary labels. We propose AutoTail-BSFGM, a class-balance-aware fine-tuning method that combines an automatically gated tail-prior adjustment, a weak Balanced Softmax auxiliary loss, and Fast Gradient Method adversarial regularization. The method changes only the training objective and procedure; inference uses the same single base-size encoder and linear classifier as the corresponding label-smoothed baseline. We evaluate the method on two CSL-based tasks: an abstract-to-discipline task with 67 labels and a title-to-category task with 13 categories. On the primary abstract task, AutoTail-BSFGM improves validation and lockbox accuracy under both Chinese RoBERTa-WWM and MacBERT-base. With MacBERT-base, validation accuracy increases by 0.83 percentage points and lockbox accuracy by 0.49 points, with a pooled paired McNemar signal on validation (p = 0.023). On the title task, the method improves validation accuracy by 0.70 points and validation balanced accuracy by 2.64 points; lockbox accuracy is approximately neutral while lockbox balanced accuracy improves by 1.22 points. The results support a bounded contribution: AutoTail-BSFGM improves class-balance-sensitive behavior and yields consistent gains for abstract-based scholarly classification, without uniformly improving every metric on every split.