Rethinking Supervised Fine-Tuning: Emphasizing Key Answer Tokens for Improved LLM Accuracy

📅 2025-12-24

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

In supervised fine-tuning (SFT), large language models (LLMs) often overemphasize lengthy chain-of-thought (CoT) reasoning, thereby weakening modeling of critical answer tokens and degrading answer accuracy. To address this, we propose SFTKey—a two-stage token-level fine-tuning method: Stage 1 ensures CoT format compliance via standard SFT; Stage 2 applies a weighted loss mask exclusively to answer tokens, explicitly decoupling and strengthening their optimization objective. This is the first work to introduce CoT-aware, token-level weighting and a phased answer-focusing mechanism in SFT. Experiments across multiple benchmarks (e.g., GSM8K, MATH, SVAMP) and model families (e.g., LLaMA-3, Qwen, Phi-3) demonstrate an average accuracy improvement of over 5%, with substantial gains in final answer correctness—while fully preserving CoT formatting fidelity and generation capability.

Technology Category

Application Category

📝 Abstract

With the rapid advancement of Large Language Models (LLMs), the Chain-of-Thought (CoT) component has become significant for complex reasoning tasks. However, in conventional Supervised Fine-Tuning (SFT), the model could allocate disproportionately more attention to CoT sequences with excessive length. This reduces focus on the much shorter but essential Key portion-the final answer, whose correctness directly determines task success and evaluation quality. To address this limitation, we propose SFTKey, a two-stage training scheme. In the first stage, conventional SFT is applied to ensure proper output format, while in the second stage, only the Key portion is fine-tuned to improve accuracy. Extensive experiments across multiple benchmarks and model families demonstrate that SFTKey achieves an average accuracy improvement exceeding 5% over conventional SFT, while preserving the ability to generate correct formats. Overall, this study advances LLM fine-tuning by explicitly balancing CoT learning with additional optimization on answer-relevant tokens.

Problem

Research questions and friction points this paper is trying to address.

Improves LLM accuracy by focusing on key answer tokens

Addresses attention imbalance between CoT sequences and final answers

Proposes a two-stage fine-tuning method to enhance answer correctness

Innovation

Methods, ideas, or system contributions that make the work stand out.

Two-stage fine-tuning balances CoT and key answer tokens

Second stage focuses only on essential answer portion

Method improves accuracy while preserving output format

🔎 Similar Papers

From Yes-Men to Truth-Tellers: Addressing Sycophancy in Large Language Models with Pinpoint Tuning

2024-09-03International Conference on Machine LearningCitations: 3

Authors to Follow