Sparse Memory Finetuning as a Low-Forgetting Alternative to LoRA and Full Finetuning

📅 2026-05-04
📈 Citations: 0
Influential: 0
📄 PDF

career value

188K/year
🤖 AI Summary
This work addresses catastrophic forgetting in fine-tuning pretrained language models by proposing a sparse memory tuning mechanism. Specifically, a key-value memory layer is integrated into the Qwen-2.5-0.5B-Instruct model, and only the high-frequency memory rows—selected per batch using KL divergence or TF-IDF criteria—are updated during adaptation. Evaluated on the MedMCQA benchmark, the method achieves a 2.5 percentage point improvement over the baseline while constraining performance degradation on WikiText perplexity and TriviaQA accuracy to within one point of the original model. This approach substantially outperforms both full-parameter fine-tuning and LoRA in preserving pre-existing knowledge, demonstrating an effective balance between acquiring new task capabilities and retaining original model proficiency.
📝 Abstract
Adapting a pretrained language model to a new task often hurts the general capabilities it already had, a problem known as catastrophic forgetting. Sparse Memory Finetuning (SMF) tries to avoid this by adding key-value memory layers to the model and, on each training step, updating only the small set of memory rows that the current batch reads most heavily. We re-implement SMF on Qwen-2.5-0.5B-Instruct and compare it with LoRA and full finetuning on MedMCQA, a 4-choice medical exam task, using WikiText perplexity and TriviaQA accuracy as forgetting probes. SMF improves MedMCQA by 2.5 percentage points while keeping both forgetting probes within roughly 1 point of the base model, whereas LoRA and full finetuning achieve larger gains but with clear drift on both. We also compare two row-selection rules (KL-divergence and TF-IDF), which balance the two forgetting metrics differently.
Problem

Research questions and friction points this paper is trying to address.

catastrophic forgetting
language model adaptation
memory retention
model finetuning
Innovation

Methods, ideas, or system contributions that make the work stand out.

Sparse Memory Finetuning
catastrophic forgetting
LoRA
memory layers
parameter-efficient finetuning