Representation Collapse in Sequential Post-Training of Large Language Models

📅 2026-05-28

📈 Citations: 0

✨ Influential: 0

career value

204K/year

🤖 AI Summary

This study systematically investigates whether representation collapse during multi-stage post-training of large language models leads to degraded adaptability, weakened out-of-distribution generalization, and deteriorated calibration. To this end, the authors construct a comprehensive measurement framework encompassing hidden states, logits, token trajectories, and LoRA updates, thereby establishing— for the first time—the causal relationship between representation collapse and declines in model plasticity, generalization, and calibration. Building on these insights, they propose lightweight intervention strategies, including mixed-domain replay, feature refreshing, representation diversity regularization, and decorrelation of LoRA updates. These methods significantly enhance continual learning capability while preserving behavioral gains, effectively mitigating representation collapse.

📝 Abstract

Large language models are now adapted through chains of post-training stages rather than through a single instruction-tuning pass. This paper studies whether such sequential post-training gradually compresses internal representations into low-rank, anisotropic, and homogeneous feature spaces. We define a measurement suite for hidden states, logits, token trajectories, and LoRA updates, and we use it to analyze supervised fine-tuning, preference optimization, safety/refusal tuning, math and code specialization, and long chain-of-thought tuning under controlled stage orderings. The central hypothesis is that excessive representation concentration is not merely a geometric curiosity: it predicts reduced plasticity during later adaptation, weaker out-of-domain generalization, and poorer calibration. We further evaluate lightweight interventions, including mixed-domain replay, feature refresh, representation diversity regularization, and LoRA update decorrelation, as ways to preserve future learnability without giving up the behavioral gains of post-training.

Problem

Research questions and friction points this paper is trying to address.

representation collapse

sequential post-training

large language models

representation homogeneity

adaptation plasticity

Innovation

Methods, ideas, or system contributions that make the work stand out.

representation collapse

sequential post-training

feature space anisotropy