🤖 AI Summary
In Mixture-of-Experts (MoE) models, experts operate in isolation, lacking collaborative reasoning capabilities.
Method: This paper proposes a pseudo-graph MoE architecture and a self-reflective iterative routing mechanism: experts are modeled as graph nodes, and dynamic inter-expert message passing is enabled via a pseudo-graph neural network; a cyclic routing strategy emulates multi-step iterative reasoning, endowing the model with self-reflection. Crucially, this is the first work to integrate explicit self-reflection into the MoE structure. The approach combines LoRA-based efficient fine-tuning with sparse expert activation, preserving computational efficiency.
Contribution/Results: The method achieves significant improvements over LoRA baselines across multiple reasoning benchmarks, attaining state-of-the-art performance—demonstrating both the efficacy and scalability of expert-collaborative deep reasoning without substantial computational overhead.
📝 Abstract
Traditional Mixture-of-Experts (MoE) networks benefit from utilizing multiple smaller expert models as opposed to a single large network. However, these experts typically operate independently, leaving a question open about whether interconnecting these models could enhance the performance of MoE networks. In response, we introduce GRAPHMOE, a novel method aimed at augmenting the cognitive depth of language models via a self-rethinking mechanism constructed on Pseudo GraphMoE networks. GRAPHMOE employs a recurrent routing strategy to simulate iterative thinking steps, thereby facilitating the flow of information among expert nodes. We implement the GRAPHMOE architecture using Low-Rank Adaptation techniques (LoRA) and conduct extensive experiments on various benchmark datasets. The experimental results reveal that GRAPHMOE outperforms other LoRA based models, achieving state-of-the-art (SOTA) performance. Additionally, this study explores a novel recurrent routing strategy that may inspire further advancements in enhancing the reasoning capabilities of language models.