🤖 AI Summary
To address high computational overhead, information decay, and insufficient robustness in multi-large-language-model (LLM) agent collaborative reasoning, this paper proposes the Residual Mixture-of-Experts architecture (RMoA). Methodologically, RMoA introduces: (1) an embedding-space diversity-aware greedy selection mechanism to enhance response diversity; (2) a dual-agent structure with cross-layer residual extraction and aggregation to mitigate information loss; and (3) an adaptive termination strategy guided by a residual convergence criterion to eliminate redundant computation. Empirically, RMoA achieves state-of-the-art performance across alignment, mathematical reasoning, code generation, and multi-task understanding benchmarks. It reduces average inference FLOPs by 37%, significantly lowering computational cost while improving system robustness. All source code is publicly available.
📝 Abstract
Although multi-agent systems based on large language models show strong capabilities on multiple tasks, they are still limited by high computational overhead, information loss, and robustness. Inspired by ResNet's residual learning, we propose Residual Mixture-of-Agents (RMoA), integrating residual connections to optimize efficiency and reliability. To maximize information utilization from model responses while minimizing computational costs, we innovatively design an embedding-based diversity selection mechanism that greedily selects responses via vector similarity. Furthermore, to mitigate iterative information degradation, we introduce a Residual Extraction Agent to preserve cross-layer incremental information by capturing inter-layer response differences, coupled with a Residual Aggregation Agent for hierarchical information integration. Additionally, we propose an adaptive termination mechanism that dynamically halts processing based on residual convergence, further improving inference efficiency. RMoA achieves state-of-the-art performance on the benchmarks of across alignment, mathematical reasoning, code generation, and multitasking understanding, while significantly reducing computational overhead. Code is available at https://github.com/mindhunter01/RMoA.