🤖 AI Summary
Existing robot Mixture-of-Experts (MoE) approaches rely on monolithic architectures and require end-to-end training of internal routing mechanisms, lacking decoupling between experts and routers and hindering plug-and-play deployment. This paper proposes MoIRA—a modular, architecture-agnostic MoE framework—where an external, text-driven router orchestrates multiple pre-trained vision-language-action experts (e.g., gr00t-N1, $π_0$) for task-adaptive decision-making. Its core innovation is a zero-shot routing mechanism supporting both embedding similarity matching and LLM-based reasoning—without additional training—combined with LoRA fine-tuning and instruction-guided expert selection to ensure low overhead and dynamic expert allocation. Evaluated on the GR1 humanoid robot and the LIBERO benchmark, MoIRA significantly outperforms general-purpose models, matches the performance of state-of-the-art MoE systems, and demonstrates strong robustness to instruction perturbations.
📝 Abstract
Mixture-of-Experts (MoE) approaches have recently gained traction in robotics applications due to their ability to dynamically allocate computational resources and specialize sub-networks for distinct tasks or environmental contexts, enabling more efficient decision-making. Such systems often comprise sparsely activated experts combined under a single monolithic architecture and require a well-configured internal routing mechanism, which does not allow for selective low-level expert and router customization and requires additional training. We propose MoIRA, an architecture-agnostic modular MoE framework designed to coordinate existing experts with an external text-based router. MoIRA incorporates two zero-shot routing options: embedding-based similarity and prompt-driven language model inference. In our experiments, we choose large Vision-Language-Action models, gr00t-N1 and $π_0$, as the underlying experts, and train low-rank adapters for low-overhead inference. We evaluate MoIRA on various GR1 Humanoid tasks and LIBERO Spatial and Goal benchmarks, where it consistently outperforms generalist models and competes with other MoE pipelines. Additionally, we analyse the robustness of the proposed approach to the variations of the instructions. While relying solely on textual descriptions of tasks and experts, MoIRA demonstrates the practical viability of modular deployment with precise, low-effort routing and provides an alternative, scalable foundation for future multi-expert robotic systems.