GRAPHMOE: Amplifying Cognitive Depth of Mixture-of-Experts Network via Introducing Self-Rethinking Mechanism

📅 2025-01-14
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
In Mixture-of-Experts (MoE) models, experts operate in isolation, lacking collaborative reasoning capabilities. Method: This paper proposes a pseudo-graph MoE architecture and a self-reflective iterative routing mechanism: experts are modeled as graph nodes, and dynamic inter-expert message passing is enabled via a pseudo-graph neural network; a cyclic routing strategy emulates multi-step iterative reasoning, endowing the model with self-reflection. Crucially, this is the first work to integrate explicit self-reflection into the MoE structure. The approach combines LoRA-based efficient fine-tuning with sparse expert activation, preserving computational efficiency. Contribution/Results: The method achieves significant improvements over LoRA baselines across multiple reasoning benchmarks, attaining state-of-the-art performance—demonstrating both the efficacy and scalability of expert-collaborative deep reasoning without substantial computational overhead.

Technology Category

Application Category

📝 Abstract
Traditional Mixture-of-Experts (MoE) networks benefit from utilizing multiple smaller expert models as opposed to a single large network. However, these experts typically operate independently, leaving a question open about whether interconnecting these models could enhance the performance of MoE networks. In response, we introduce GRAPHMOE, a novel method aimed at augmenting the cognitive depth of language models via a self-rethinking mechanism constructed on Pseudo GraphMoE networks. GRAPHMOE employs a recurrent routing strategy to simulate iterative thinking steps, thereby facilitating the flow of information among expert nodes. We implement the GRAPHMOE architecture using Low-Rank Adaptation techniques (LoRA) and conduct extensive experiments on various benchmark datasets. The experimental results reveal that GRAPHMOE outperforms other LoRA based models, achieving state-of-the-art (SOTA) performance. Additionally, this study explores a novel recurrent routing strategy that may inspire further advancements in enhancing the reasoning capabilities of language models.
Problem

Research questions and friction points this paper is trying to address.

Expert Ensemble Networks
Enhanced Communication
Performance Improvement
Innovation

Methods, ideas, or system contributions that make the work stand out.

GRAPHMOE
Low-Rank Adaptation
Mixture of Experts
🔎 Similar Papers
No similar papers found.
C
Chen Tang
Institute for Advanced Algorithms Research, Shanghai
B
Bo Lv
Institute of Computing Technology, Chinese Academy of Sciences
Z
Zifan Zheng
Institute for Advanced Algorithms Research, Shanghai, University of Sydney
Bohao Yang
Bohao Yang
University of Manchester
NLPDialogue GenerationDialogue EvaluationTable UnderstandingLLMs
K
Kun Zhao
The University of Pittsburgh
Ning Liao
Ning Liao
Shanghai Jiao Tong University
LLMMLLMMoE
Xiaoxing Wang
Xiaoxing Wang
SJTU
Machine LearningAutoMLNeural Architecture Search
Feiyu Xiong
Feiyu Xiong
MemTensor (Shanghai) Technology Co., Ltd.
Machine LearningNLPLLM
Zhiyu Li
Zhiyu Li
Tianjin University
Robust controlattitude control
N
Nayu Liu
Institute of Computing Technology, Chinese Academy of Sciences
Jingchi Jiang
Jingchi Jiang
Harbin Institute of Technology
Knowledge GraphMachine LearningData Mining