AlphaToken: Decoupling Adaptation and Stability for Path-Aware Response Token Valuation in LLM Post-Training

📅 2026-05-31
📈 Citations: 0
Influential: 0
📄 PDF

career value

182K/year
🤖 AI Summary
This work addresses the limitations of existing token selection strategies in large language model post-training, which often rely on local heuristics and struggle to balance task adaptation with retention of pre-trained knowledge. The authors propose AlphaToken, a novel framework that formulates token selection as a path-aware, dual-objective valuation problem. By integrating causal path signals from autoregressive generation with local gradients, AlphaToken separately quantifies each token’s adaptability—its contribution to target task learning—and stability—its role in preserving pre-trained knowledge. Stability is efficiently approximated via Fisher information matrix drift and scaled to token-level valuation through Ghost Dot-Product. Leveraging this dual metric, the method dynamically masks low-value tokens during both fine-tuning and preference optimization. Experiments demonstrate that AlphaToken significantly enhances post-training performance, effectively mitigates catastrophic forgetting, and exhibits strong effectiveness and robustness across multiple tasks.
📝 Abstract
Token selection is pivotal for effective LLM post-training. However, existing methods mostly rely on local heuristics and rarely formulate token selection as a principled valuation of individual response tokens. We introduce $\textbf{AlphaToken}$, a response token valuation framework that decouples valuation into $\textbf{adaptation}$ (promoting target-task learning) and $\textbf{stability}$ (preserving pre-trained capabilities), and makes each objective $\textbf{path-aware}$ by combining the direct-path signal from local token gradients with the downstream causal-path signal in autoregressive generation. Since retention data are typically unavailable, AlphaToken approximates stability via a $\textbf{Fisher-drift proxy}$ anchored at the pre-trained reference model. For efficient computation, we extend Ghost Dot-Product to token-level valuation. AlphaToken masks low-value response tokens during fine-tuning and preference optimization, concentrating training signals on more valuable positions. Experiments show that AlphaToken improves post-training performance and mitigates catastrophic forgetting.
Problem

Research questions and friction points this paper is trying to address.

token selection
LLM post-training
catastrophic forgetting
adaptation
stability
Innovation

Methods, ideas, or system contributions that make the work stand out.

AlphaToken
token valuation
path-aware
adaptation-stability decoupling
Fisher-drift proxy