Rethinking Supervised Fine-Tuning: Emphasizing Key Answer Tokens for Improved LLM Accuracy

📅 2025-12-24
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
In supervised fine-tuning (SFT), large language models (LLMs) often overemphasize lengthy chain-of-thought (CoT) reasoning, thereby weakening modeling of critical answer tokens and degrading answer accuracy. To address this, we propose SFTKey—a two-stage token-level fine-tuning method: Stage 1 ensures CoT format compliance via standard SFT; Stage 2 applies a weighted loss mask exclusively to answer tokens, explicitly decoupling and strengthening their optimization objective. This is the first work to introduce CoT-aware, token-level weighting and a phased answer-focusing mechanism in SFT. Experiments across multiple benchmarks (e.g., GSM8K, MATH, SVAMP) and model families (e.g., LLaMA-3, Qwen, Phi-3) demonstrate an average accuracy improvement of over 5%, with substantial gains in final answer correctness—while fully preserving CoT formatting fidelity and generation capability.

Technology Category

Application Category

📝 Abstract
With the rapid advancement of Large Language Models (LLMs), the Chain-of-Thought (CoT) component has become significant for complex reasoning tasks. However, in conventional Supervised Fine-Tuning (SFT), the model could allocate disproportionately more attention to CoT sequences with excessive length. This reduces focus on the much shorter but essential Key portion-the final answer, whose correctness directly determines task success and evaluation quality. To address this limitation, we propose SFTKey, a two-stage training scheme. In the first stage, conventional SFT is applied to ensure proper output format, while in the second stage, only the Key portion is fine-tuned to improve accuracy. Extensive experiments across multiple benchmarks and model families demonstrate that SFTKey achieves an average accuracy improvement exceeding 5% over conventional SFT, while preserving the ability to generate correct formats. Overall, this study advances LLM fine-tuning by explicitly balancing CoT learning with additional optimization on answer-relevant tokens.
Problem

Research questions and friction points this paper is trying to address.

Improves LLM accuracy by focusing on key answer tokens
Addresses attention imbalance between CoT sequences and final answers
Proposes a two-stage fine-tuning method to enhance answer correctness
Innovation

Methods, ideas, or system contributions that make the work stand out.

Two-stage fine-tuning balances CoT and key answer tokens
Second stage focuses only on essential answer portion
Method improves accuracy while preserving output format
🔎 Similar Papers
2024-09-03International Conference on Machine LearningCitations: 3
X
Xiaofeng Shi
Beijing Academy of Artificial Intelligence (BAAI)
Q
Qian Kou
Beijing Academy of Artificial Intelligence (BAAI)
Y
Yuduo Li
Beijing Academy of Artificial Intelligence (BAAI), Beijing Jiaotong University (BJTU)
Hua Zhou
Hua Zhou
Advance Photon Source, Argonne National Laboratory
Materials PhysicsSynchrotron RadiationSurface and InterfaceQuantum MaterialsEnergy Materials