Routing-Aware Expert Calibration for Machine Unlearning in Mixture-of-Experts Language Models

📅 2026-06-08

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

This work addresses the insufficient regularization of critical experts in Mixture-of-Experts (MoE) language models during machine unlearning, a problem arising from misaligned routing between forget and retain data. To tackle this, the authors propose TRACE, the first method to explicitly incorporate routing behavior into the unlearning process. TRACE identifies experts crucial to the forget task via offline activation statistics and applies token-level loss reweighting on retain data to align the activation distributions of these critical experts between forget and retain sets, thereby achieving routing-aware expert calibration. Evaluated on the WMDP and MUSE-BOOKS benchmarks, TRACE maintains high unlearning quality while yielding a 9% relative utility improvement over the strongest baseline and achieves state-of-the-art performance on three out of four metrics in MUSE-BOOKS.

📝 Abstract

Machine unlearning is increasingly important for large language models, yet unlearning in Mixture-of-Experts (MoE) architectures remains underexplored. Unlike dense models, MoE architectures employ a router at each layer to assign each token to a sparse subset of experts. In this work, we observe that forget data often activates a small subset of experts disproportionately, while these experts may receive much weaker activation from retain data. This forget--retain routing mismatch can leave forget-critical experts under-regularized during unlearning. To address this, we propose \textbf{TRACE}, Targeted Routing-Aware Calibration of Experts, for MoE unlearning. TRACE first detects forget-critical experts from offline activation statistics, and then calibrates retain regularization by reweighting token-level retain losses so that each selected expert's retain-side activation frequency better matches its forget-side counterpart. Experiments on WMDP and MUSE-BOOKS across multiple MoE LLMs show that TRACE consistently improves the forget-utility trade-off, yielding a 9\% relative utility improvement over the strongest baseline under comparable forgetting quality and the best performance on three out of four MUSE-BOOKS metrics.

Problem

Research questions and friction points this paper is trying to address.

Machine Unlearning

Mixture-of-Experts

Routing Mismatch

Expert Calibration

Large Language Models

Innovation

Methods, ideas, or system contributions that make the work stand out.

Machine Unlearning

Mixture-of-Experts

Routing-Aware Calibration