Foundation-Preserving Adaptation via Generalized Rayleigh-Quotient Optimization

📅 2026-05-28

📈 Citations: 0

✨ Influential: 0

career value

168K/year

🤖 AI Summary

This work addresses the challenge of catastrophic forgetting in fine-tuning foundation models, where adaptation to target tasks often degrades performance on non-target capabilities acquired during pretraining. To dynamically balance task-specific adaptation with knowledge retention, the authors propose FoLoRA, a novel framework that introduces a first-order preservation-aware optimization mechanism. FoLoRA evaluates the trade-off between task utility and forgetting penalty of parameter updates via a generalized Rayleigh quotient, guiding directional Adam updates in a spectral coordinate system. It further leverages model-generated proxy data from the pretraining distribution to reactivate original knowledge. Integrated with LoRA-based fine-tuning and a directional gating strategy, FoLoRA consistently outperforms baselines on mathematical reasoning, code generation, and instruction-following tasks, achieving superior target-task performance while optimally preserving non-target capabilities.

📝 Abstract

While finetuning effectively adapts foundation models to specialized downstream tasks, it can degrade nontarget capabilities acquired during pretraining. Existing forgetting aware methods typically seek safer updates through specialized initialization or fixed constraints, but do not regulate the adaptation preservation trade-off during training. We propose Foundation Preserving LoRA (FoLoRA), a forgetting aware optimization framework. Guided by a first order preservation condition, FoLoRA defines a forgetting penalty over pretraining-proxy activations and a task utility over downstream task activations. It then scores update directions by task utility per unit forgetting penalty via a generalized Rayleigh quotient. The resulting spectral coordinate system enables direction wise gated Adam updates, attenuating low utility to penalty directions during training. To estimate the forgetting penalty, FoLoRA constructs pretraining proxy calibration data by sampling from the pretrained model rather than relying on a single proxy dataset. Experiments on math, code, and instruction following adaptation show that FoLoRA achieves the strongest preservation adaptation balance over baselines, improving target task performance with best aggregate preservation of non target capabilities.

Problem

Research questions and friction points this paper is trying to address.

foundation models

catastrophic forgetting

fine-tuning

capability preservation

adaptation trade-off

Innovation

Methods, ideas, or system contributions that make the work stand out.

foundation model preservation

generalized Rayleigh quotient

forgetting-aware adaptation