GRAPHMOE: Amplifying Cognitive Depth of Mixture-of-Experts Network via Introducing Self-Rethinking Mechanism

📅 2025-01-14

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

In Mixture-of-Experts (MoE) models, experts operate in isolation, lacking collaborative reasoning capabilities. Method: This paper proposes a pseudo-graph MoE architecture and a self-reflective iterative routing mechanism: experts are modeled as graph nodes, and dynamic inter-expert message passing is enabled via a pseudo-graph neural network; a cyclic routing strategy emulates multi-step iterative reasoning, endowing the model with self-reflection. Crucially, this is the first work to integrate explicit self-reflection into the MoE structure. The approach combines LoRA-based efficient fine-tuning with sparse expert activation, preserving computational efficiency. Contribution/Results: The method achieves significant improvements over LoRA baselines across multiple reasoning benchmarks, attaining state-of-the-art performance—demonstrating both the efficacy and scalability of expert-collaborative deep reasoning without substantial computational overhead.

Technology Category

Application Category

📝 Abstract

Traditional Mixture-of-Experts (MoE) networks benefit from utilizing multiple smaller expert models as opposed to a single large network. However, these experts typically operate independently, leaving a question open about whether interconnecting these models could enhance the performance of MoE networks. In response, we introduce GRAPHMOE, a novel method aimed at augmenting the cognitive depth of language models via a self-rethinking mechanism constructed on Pseudo GraphMoE networks. GRAPHMOE employs a recurrent routing strategy to simulate iterative thinking steps, thereby facilitating the flow of information among expert nodes. We implement the GRAPHMOE architecture using Low-Rank Adaptation techniques (LoRA) and conduct extensive experiments on various benchmark datasets. The experimental results reveal that GRAPHMOE outperforms other LoRA based models, achieving state-of-the-art (SOTA) performance. Additionally, this study explores a novel recurrent routing strategy that may inspire further advancements in enhancing the reasoning capabilities of language models.

Problem

Research questions and friction points this paper is trying to address.

Expert Ensemble Networks

Enhanced Communication

Performance Improvement

Innovation

Methods, ideas, or system contributions that make the work stand out.

GRAPHMOE

Low-Rank Adaptation

Mixture of Experts

🔎 Similar Papers

No similar papers found.

Authors to Follow