🤖 AI Summary
Existing data pruning methods treat sample-level and token-level redundancy in isolation, leading to loss of critical signals from high-value samples and persistence of redundant tokens—thereby limiting the efficiency of supervised fine-tuning. This paper proposes Q-Tuning, the first unified framework jointly optimizing sample- and token-level pruning. It models data heterogeneity via an Error-Uncertainty (EU) plane, introduces a three-way sample classification mechanism and a context-aware asymmetric token scoring strategy, and incorporates a calibration signal preservation mechanism to retain pedagogically informative corrective feedback. Q-Tuning enables dynamic, synergistic dual-granularity pruning. Evaluated on five benchmarks, it achieves an average +38% performance gain on SmolLM2-1.7B using only 12.5% of the training data—marking the first instance where pruned-data fine-tuning surpasses full-data fine-tuning.
📝 Abstract
As supervised fine-tuning (SFT) evolves from a lightweight post-training step into a compute-intensive phase rivaling mid-training in scale, data efficiency has become critical for aligning large language models (LLMs) under tight budgets. Existing data pruning methods suffer from a fragmented design: they operate either at the sample level or the token level in isolation, failing to jointly optimize both dimensions. This disconnect leads to significant inefficiencies--high-value samples may still contain redundant tokens, while token-level pruning often discards crucial instructional or corrective signals embedded in individual examples. To address this bottleneck, we introduce the Error-Uncertainty (EU) Plane, a diagnostic framework that jointly characterizes the heterogeneous utility of training data across samples and tokens. Guided by this insight, we propose Quadrant-based Tuning (Q-Tuning), a unified framework that strategically coordinates sample pruning and token pruning. Q-Tuning employs a two-stage strategy: first, it performs sample-level triage to retain examples rich in informative misconceptions or calibration signals; second, it applies an asymmetric token-pruning policy, using a context-aware scoring mechanism to trim less salient tokens exclusively from misconception samples while preserving calibration samples in their entirety. Our method sets a new state of the art across five diverse benchmarks. Remarkably, on SmolLM2-1.7B, Q-Tuning achieves a +38% average improvement over the full-data SFT baseline using only 12.5% of the original training data. As the first dynamic pruning approach to consistently outperform full-data training, Q-Tuning provides a practical and scalable blueprint for maximizing data utilization in budget-constrained LLM SFT.