Representation Collapse in Sequential Post-Training of Large Language Models

📅 2026-05-28
📈 Citations: 0
Influential: 0
📄 PDF

career value

204K/year
🤖 AI Summary
This study systematically investigates whether representation collapse during multi-stage post-training of large language models leads to degraded adaptability, weakened out-of-distribution generalization, and deteriorated calibration. To this end, the authors construct a comprehensive measurement framework encompassing hidden states, logits, token trajectories, and LoRA updates, thereby establishing— for the first time—the causal relationship between representation collapse and declines in model plasticity, generalization, and calibration. Building on these insights, they propose lightweight intervention strategies, including mixed-domain replay, feature refreshing, representation diversity regularization, and decorrelation of LoRA updates. These methods significantly enhance continual learning capability while preserving behavioral gains, effectively mitigating representation collapse.
📝 Abstract
Large language models are now adapted through chains of post-training stages rather than through a single instruction-tuning pass. This paper studies whether such sequential post-training gradually compresses internal representations into low-rank, anisotropic, and homogeneous feature spaces. We define a measurement suite for hidden states, logits, token trajectories, and LoRA updates, and we use it to analyze supervised fine-tuning, preference optimization, safety/refusal tuning, math and code specialization, and long chain-of-thought tuning under controlled stage orderings. The central hypothesis is that excessive representation concentration is not merely a geometric curiosity: it predicts reduced plasticity during later adaptation, weaker out-of-domain generalization, and poorer calibration. We further evaluate lightweight interventions, including mixed-domain replay, feature refresh, representation diversity regularization, and LoRA update decorrelation, as ways to preserve future learnability without giving up the behavioral gains of post-training.
Problem

Research questions and friction points this paper is trying to address.

representation collapse
sequential post-training
large language models
representation homogeneity
adaptation plasticity
Innovation

Methods, ideas, or system contributions that make the work stand out.

representation collapse
sequential post-training
feature space anisotropy
LoRA decorrelation
representation diversity regularization
🔎 Similar Papers
No similar papers found.