🤖 AI Summary
This study addresses the problem of identifying critical experts responsible for factual recall in sparse Mixture-of-Experts (MoE) language models. It proposes the first expert-aware causal tracing framework tailored for sparse MoE architectures, leveraging embedding perturbations, layer- and expert-level interventions, and logit-based contrastive analysis to quantify each expert’s contribution to factual signals on the CounterFact dataset. Experiments reveal that in Qwen3-30B, individual experts—such as L44E069—can be precisely localized as key carriers of factual knowledge, whereas Mixtral-8x7B relies on collaborative contributions from multiple experts to recover factual information. These findings demonstrate that factual representations are localizable at the expert level, though the degree of localization is model-dependent, thereby validating both the efficacy and inherent limitations of the proposed method.
📝 Abstract
Causal tracing of factual recall has been studied predominantly in dense transformer language models, where interventions localize information flow to layers or feed-forward modules. Sparse mixture-of-experts (MoE) language models introduce a sharper question: when a factual prediction is mediated by a routed MoE block, which routed expert contributions matter? We formulate expert-aware causal tracing for sparse MoE language models. Using CounterFact facts, we first corrupt the model's factual preference by adding noise to subject-token embeddings, and then test whether clean MoE-block outputs or clean expert-level updates restore the true-vs-foil logit contrast. For Qwen3-30B-A3B-Base, a layer sweep selects and validates layer 44, and expert-level tracing identifies L44E069 as an expert repeatedly selected in the clean run whose held-out patch outperforms other active same-layer expert patches. For Mixtral-8x7B-v0.1, layer-level tracing validates a mid-layer signal, but the signal is not localized to the selected singleton expert; a coalition check instead recovers it with routed multi-expert updates. These results suggest that MoE factual tracing can be made expert-aware, while also showing that expert-level localization is model- and protocol-dependent rather than universal.