MAAT: Multi-phase Adapter-Aware Targeted Unlearning

📅 2026-05-28
📈 Citations: 0
Influential: 0
📄 PDF

career value

190K/year
🤖 AI Summary
Existing machine unlearning methods exhibit limited efficacy on causal (Why-type) knowledge and suffer from the absence of targeted evaluation benchmarks, leading to inflated performance estimates. This work proposes MAAT, a framework that performs directed unlearning in three stages on LoRA adapter weights by integrating gradient projection ascent, SVD-based rank pruning, task vector negation, and mixed KL divergence-based hidden state restoration to efficiently achieve both forgetting and retention of causal knowledge. We introduce 5WBENCH, the first balanced benchmark for this task, and demonstrate—on Why-type questions—for the first time simultaneous high forgetting and high retention rates, surpassing the unlearning–retention Pareto frontier. Experiments show that MAAT significantly outperforms current approaches in addressing the challenge of causal knowledge unlearning, and the code is publicly released.
📝 Abstract
Machine unlearning evaluation is structurally skewed: Why-type questions, which probe causal and relational knowledge, comprise less than 0.06% of CounterFact, 0.6% of ZSRE, and less than 1.3% of TOFU, MUSE, and WMDP-Cyber. This near-zero representation means that methods that fail on causal knowledge can score highly in aggregate, and this failure is undetectable without balanced evaluation. We present 5WBENCH, a balanced 5,000-sample benchmark with 1,000 examples per 5W category (Who, What, When, Where, Why), making causal unlearning failures quantifiable for the first time. Using 5WBENCH, we show that no existing baseline simultaneously achieves high forgetting and high retention on Why-type questions: aggressive forgetting degrades retained knowledge, while conservative methods fail to forget causal facts. Why-type difficulty stems from multi-hop reasoning chains (44% of Why entries vs. less than or equal to 2% for others) and gradient dilution over 40.1-token answer spans. We present MAAT (Multi-phase Adapter-Aware Targeted Unlearning), a three-phase framework operating on LoRA adapter weights, combining gradient-projected ascent, SVD rank-dimension pruning, task vector negation, and hybrid KL-hidden-state retain repair. MAAT is the first method to simultaneously achieve high forgetting and high retention on Why-type causal knowledge, reaching a new operating point on the forget-retain Pareto frontier. We make our code publicly available.
Problem

Research questions and friction points this paper is trying to address.

machine unlearning
causal knowledge
evaluation benchmark
Why-type questions
forgetting
Innovation

Methods, ideas, or system contributions that make the work stand out.

machine unlearning
causal reasoning
adapter-based unlearning
balanced benchmark
Pareto frontier