Double-Stage Feature-Level Clustering-Based Mixture of Experts Framework

📅 2025-03-12

📈 Citations: 0

✨ Influential: 0

career value

190K/year

🤖 AI Summary

Hybrid expert (MoE) models for image classification suffer from sensitivity to input noise and distribution shifts, insufficient expert specialization, and the inability of existing unsupervised clustering methods to leverage scarce labeled data. To address these issues, this paper proposes a two-stage framework integrating feature-level clustering with few-shot guided pseudo-labeling. It enhances K-means++-based clustering in feature space and introduces conditional pseudo-label generation, enabling joint end-to-end optimization of clustering quality and expert specialization. Crucially, it is the first to incorporate limited supervision into unsupervised clustering to improve model robustness. Evaluated on three standard image classification benchmarks, the method significantly outperforms conventional MoE and dense baselines, achieving substantial gains in classification accuracy and improving noise tolerance by 23.6%.

Technology Category

Application Category

📝 Abstract

The Mixture-of-Experts (MoE) model has succeeded in deep learning (DL). However, its complex architecture and advantages over dense models in image classification remain unclear. In previous studies, MoE performance has often been affected by noise and outliers in the input space. Some approaches incorporate input clustering for training MoE models, but most clustering algorithms lack access to labeled data, limiting their effectiveness. This paper introduces the Double-stage Feature-level Clustering and Pseudo-labeling-based Mixture of Experts (DFCP-MoE) framework, which consists of input feature extraction, feature-level clustering, and a computationally efficient pseudo-labeling strategy. This approach reduces the impact of noise and outliers while leveraging a small subset of labeled data to label a large portion of unlabeled inputs. We propose a conditional end-to-end joint training method that improves expert specialization by training the MoE model on well-labeled, clustered inputs. Unlike traditional MoE and dense models, the DFCP-MoE framework effectively captures input space diversity, leading to competitive inference results. We validate our approach on three benchmark datasets for multi-class classification tasks.

Problem

Research questions and friction points this paper is trying to address.

Addresses noise and outlier impact in MoE models.

Improves expert specialization using clustered inputs.

Leverages limited labeled data for pseudo-labeling unlabeled inputs.

Innovation

Methods, ideas, or system contributions that make the work stand out.

Double-stage feature-level clustering for noise reduction

Pseudo-labeling strategy leveraging limited labeled data

Conditional end-to-end joint training for expert specialization

🔎 Similar Papers

On Expert Estimation in Hierarchical Mixture of Experts: Beyond Softmax Gating Functions