BoostTransformer: Enhancing Transformer Models with Subgrid Selection and Importance Sampling

📅 2025-08-04
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the high computational cost and complex hyperparameter tuning of Transformer models, this paper proposes an efficient training framework integrating boosting mechanisms. The method introduces (1) a least-squares boosting objective—replacing standard cross-entropy—to concentrate gradient updates on hard-to-classify samples; (2) a sub-grid token selection strategy that dynamically identifies information-dense local token subsets; and (3) importance-weighted sampling to suppress redundant computation. These components are jointly embedded into the Transformer training pipeline. Empirical evaluation across multiple fine-grained text classification benchmarks demonstrates that the approach accelerates convergence and improves generalization: it reduces training time by 32%–47% while boosting accuracy by 1.8–3.4 percentage points on average. Moreover, it significantly lowers architecture search overhead.

Technology Category

Application Category

📝 Abstract
Transformer architectures dominate modern NLP but often demand heavy computational resources and intricate hyperparameter tuning. To mitigate these challenges, we propose a novel framework, BoostTransformer, that augments transformers with boosting principles through subgrid token selection and importance-weighted sampling. Our method incorporates a least square boosting objective directly into the transformer pipeline, enabling more efficient training and improved performance. Across multiple fine-grained text classification benchmarks, BoostTransformer demonstrates both faster convergence and higher accuracy, surpassing standard transformers while minimizing architectural search overhead.
Problem

Research questions and friction points this paper is trying to address.

Reduce computational demands of transformer models
Simplify hyperparameter tuning in transformers
Improve training efficiency and model accuracy
Innovation

Methods, ideas, or system contributions that make the work stand out.

Subgrid token selection for efficiency
Importance-weighted sampling optimization
Least square boosting objective integration
🔎 Similar Papers
No similar papers found.
B
Biyi Fang
Northwestern University
Jean Utke
Jean Utke
Allstate
deep learningoptimization
T
Truong Vo
Northwestern University
Diego Klabjan
Diego Klabjan
Northwestern University
Machine learning