ExPaMoE: An Expandable Parallel Mixture of Experts for Continual Test-Time Adaptation

📅 2025-07-01
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing continual test-time adaptation (CTTA) methods suffer from feature entanglement and catastrophic forgetting under large-scale or non-stationary domain shifts due to shared model parameters. To address this, we propose ParMoE-CTTA—a parallel Mixture-of-Experts-based online adaptation framework. Its core contributions are: (1) a dual-branch expert architecture that disentangles generalizable and domain-specific knowledge; (2) a lightweight, frequency-aware online domain discriminator for fine-grained distribution shift detection; and (3) a token-guided dynamic expert expansion mechanism enabling incremental, scalable model evolution. Extensive experiments on CIFAR, ImageNet-C, Cityscapes→ACDC, and our newly introduced large-scale ImageNet++ benchmark demonstrate significant improvements over state-of-the-art methods, validating ParMoE-CTTA’s robust long-term adaptability and strong resistance to catastrophic forgetting.

Technology Category

Application Category

📝 Abstract
Continual Test-Time Adaptation (CTTA) aims to enable models to adapt on-the-fly to a stream of unlabeled data under evolving distribution shifts. However, existing CTTA methods typically rely on shared model parameters across all domains, making them vulnerable to feature entanglement and catastrophic forgetting in the presence of large or non-stationary domain shifts. To address this limitation, we propose extbf{ExPaMoE}, a novel framework based on an emph{Expandable Parallel Mixture-of-Experts} architecture. ExPaMoE decouples domain-general and domain-specific knowledge via a dual-branch expert design with token-guided feature separation, and dynamically expands its expert pool based on a emph{Spectral-Aware Online Domain Discriminator} (SODD) that detects distribution changes in real-time using frequency-domain cues. Extensive experiments demonstrate the superiority of ExPaMoE across diverse CTTA scenarios. We evaluate our method on standard benchmarks including CIFAR-10C, CIFAR-100C, ImageNet-C, and Cityscapes-to-ACDC for semantic segmentation. Additionally, we introduce extbf{ImageNet++}, a large-scale and realistic CTTA benchmark built from multiple ImageNet-derived datasets, to better reflect long-term adaptation under complex domain evolution. ExPaMoE consistently outperforms prior arts, showing strong robustness, scalability, and resistance to forgetting.
Problem

Research questions and friction points this paper is trying to address.

Adapt models to evolving data distributions without shared parameters
Prevent feature entanglement and catastrophic forgetting in domain shifts
Detect real-time distribution changes using frequency-domain analysis
Innovation

Methods, ideas, or system contributions that make the work stand out.

Expandable Parallel Mixture-of-Experts architecture
Dual-branch expert design with token-guided feature separation
Spectral-Aware Online Domain Discriminator for real-time detection
🔎 Similar Papers
No similar papers found.