🤖 AI Summary
This work addresses the insufficient regularization of critical experts in Mixture-of-Experts (MoE) language models during machine unlearning, a problem arising from misaligned routing between forget and retain data. To tackle this, the authors propose TRACE, the first method to explicitly incorporate routing behavior into the unlearning process. TRACE identifies experts crucial to the forget task via offline activation statistics and applies token-level loss reweighting on retain data to align the activation distributions of these critical experts between forget and retain sets, thereby achieving routing-aware expert calibration. Evaluated on the WMDP and MUSE-BOOKS benchmarks, TRACE maintains high unlearning quality while yielding a 9% relative utility improvement over the strongest baseline and achieves state-of-the-art performance on three out of four metrics in MUSE-BOOKS.
📝 Abstract
Machine unlearning is increasingly important for large language models, yet unlearning in Mixture-of-Experts (MoE) architectures remains underexplored. Unlike dense models, MoE architectures employ a router at each layer to assign each token to a sparse subset of experts. In this work, we observe that forget data often activates a small subset of experts disproportionately, while these experts may receive much weaker activation from retain data. This forget--retain routing mismatch can leave forget-critical experts under-regularized during unlearning. To address this, we propose \textbf{TRACE}, Targeted Routing-Aware Calibration of Experts, for MoE unlearning. TRACE first detects forget-critical experts from offline activation statistics, and then calibrates retain regularization by reweighting token-level retain losses so that each selected expert's retain-side activation frequency better matches its forget-side counterpart. Experiments on WMDP and MUSE-BOOKS across multiple MoE LLMs show that TRACE consistently improves the forget-utility trade-off, yielding a 9\% relative utility improvement over the strongest baseline under comparable forgetting quality and the best performance on three out of four MUSE-BOOKS metrics.