🤖 AI Summary
Standard supervised fine-tuning (SFT) uniformly penalizes all tokens, degrading output diversity and generalization in mathematical reasoning. Method: We propose selective critical-token fine-tuning, which identifies sparse, causally critical tokens—those whose perturbation alters reasoning correctness—via counterfactual analysis, and applies gradient updates exclusively at these positions while preserving the original token distributions elsewhere to maintain diversity and robustness. The method integrates seamlessly into standard SFT and supports test-time sampling extensions and reinforcement learning initialization. Results: Experiments across three model families (5 models) and 11 mathematical reasoning benchmarks show that fine-tuning fewer than 12% of tokens consistently outperforms full SFT, while increasing output entropy and improving training stability.
📝 Abstract
Large language models (LLMs) primarily rely on supervised fine-tuning (SFT) as a key method to adapt pre-trained models to domain-specific tasks such as mathematical reasoning. However, standard SFT uniformly penalizes all tokens, neglecting that only a small subset of critical tokens determines reasoning correctness. This uniform supervision often causes reduced output diversity and limited generalization. We propose Critical Token Fine-tuning (CFT), a simple yet effective approach that updates only tokens identified as functionally indispensable via counterfactual perturbations. By focusing gradient signals on these decisive reasoning steps while preserving the diversity of non-critical tokens, CFT can enhance both generation and diversity. Extensive experiments on five models across three families (Qwen, OLMo, LLaMA) and eleven mathematical reasoning benchmarks show that CFT, despite fine-tuning on less than 12% of tokens, consistently outperforms standard SFT. Moreover, CFT enables test-time scaling through improved sampling diversity and provides a stronger initialization for reinforcement learning, sustaining performance gains in later training stages while maintaining higher entropy for better exploration. These results highlight CFT as a practical and general framework for efficient and robust LLM fine-tuning.