Mapping Post-Training Forgetting in Language Models at Scale

📅 2025-10-20

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

This paper systematically investigates knowledge forgetting and backward transfer in large language model (LLM) post-training. Methodologically, it introduces a sample-level dynamic analysis paradigm centered on state transitions—specifically, 1→0 (knowledge loss) and 0→1 (capability reacquisition)—and incorporates an opportunity-correction mechanism to eliminate biases from random guessing, thereby uncovering fine-grained evolutionary patterns obscured by task-averaged metrics. Through experiments on multiple-choice benchmarks, sample-level accuracy tracking, and cross-model/cross-scale data ablations, the study reveals: (i) domain-specific continual pretraining induces moderate forgetting; (ii) RL and supervised fine-tuning (SFT) trigger significant backward transfer in mathematical reasoning; (iii) post-instruction-tuning RL/SFT performance is highly data-scale-dependent; and (iv) model merging fails to robustly mitigate forgetting. This work provides the first fine-grained, interpretable, and quantifiable characterization of knowledge dynamics during LLM post-training.

Technology Category

Application Category

📝 Abstract

Scaled post-training now drives many of the largest capability gains in language models (LMs), yet its effect on pretrained knowledge remains poorly understood. Not all forgetting is equal: Forgetting one fact (e.g., a U.S. president or an API call) does not "average out" by recalling another. Hence, we propose a sample-wise paradigm to measure what is forgotten and when backward transfer occurs. Our metric counts 1->0 transitions (correct before post-training, incorrect after) to quantify forgetting and 0->1 transitions to quantify backward transfer. Traditional task averages conflate these effects and obscure large changes. For multiple-choice benchmarks, we add chance-adjusted variants that subtract the expected contribution of random guessing from pre- and post-training accuracies. We apply this framework across post-training stages, model sizes, and data scales. Our large-scale analysis shows that: (1) Domain-continual pretraining induces moderate forgetting with low-to-moderate backward transfer; (2) RL/SFT post-training applied to base models and Instruction tuning yields moderate-to-large backward transfer on math and logic with overall low-to-moderate forgetting; (3) Applying RL/SFT to instruction-tuned models is sensitive on data scale: at small scales, both forgetting and backward transfer are small; at larger scales, effects are mixed and warrant further study with better controls; (4) Model merging does not reliably mitigate forgetting. Overall, our framework offers a practical yardstick for mapping how post-training alters pretrained knowledge at scale -- enabling progress towards generally capable AI systems.

Problem

Research questions and friction points this paper is trying to address.

Measuring forgetting and backward transfer in language models

Quantifying knowledge changes during post-training stages

Analyzing effects of scaling on pretrained knowledge retention

Innovation

Methods, ideas, or system contributions that make the work stand out.

Sample-wise paradigm measures forgetting and backward transfer

Chance-adjusted variants subtract random guessing from accuracies

Framework analyzes post-training stages, model sizes, and data scales

🔎 Similar Papers

No similar papers found.

Authors to Follow