RMoA: Optimizing Mixture-of-Agents through Diversity Maximization and Residual Compensation

📅 2025-05-30

📈 Citations: 0

✨ Influential: 0

career value

204K/year

🤖 AI Summary

To address high computational overhead, information decay, and insufficient robustness in multi-large-language-model (LLM) agent collaborative reasoning, this paper proposes the Residual Mixture-of-Experts architecture (RMoA). Methodologically, RMoA introduces: (1) an embedding-space diversity-aware greedy selection mechanism to enhance response diversity; (2) a dual-agent structure with cross-layer residual extraction and aggregation to mitigate information loss; and (3) an adaptive termination strategy guided by a residual convergence criterion to eliminate redundant computation. Empirically, RMoA achieves state-of-the-art performance across alignment, mathematical reasoning, code generation, and multi-task understanding benchmarks. It reduces average inference FLOPs by 37%, significantly lowering computational cost while improving system robustness. All source code is publicly available.

Technology Category

Application Category

📝 Abstract

Although multi-agent systems based on large language models show strong capabilities on multiple tasks, they are still limited by high computational overhead, information loss, and robustness. Inspired by ResNet's residual learning, we propose Residual Mixture-of-Agents (RMoA), integrating residual connections to optimize efficiency and reliability. To maximize information utilization from model responses while minimizing computational costs, we innovatively design an embedding-based diversity selection mechanism that greedily selects responses via vector similarity. Furthermore, to mitigate iterative information degradation, we introduce a Residual Extraction Agent to preserve cross-layer incremental information by capturing inter-layer response differences, coupled with a Residual Aggregation Agent for hierarchical information integration. Additionally, we propose an adaptive termination mechanism that dynamically halts processing based on residual convergence, further improving inference efficiency. RMoA achieves state-of-the-art performance on the benchmarks of across alignment, mathematical reasoning, code generation, and multitasking understanding, while significantly reducing computational overhead. Code is available at https://github.com/mindhunter01/RMoA.

Problem

Research questions and friction points this paper is trying to address.

Optimizing multi-agent systems for efficiency and reliability

Reducing computational overhead in large language models

Mitigating information loss and improving robustness

Innovation

Methods, ideas, or system contributions that make the work stand out.

Residual connections optimize efficiency and reliability

Embedding-based diversity selection minimizes computational costs

Adaptive termination improves inference efficiency dynamically

🔎 Similar Papers

Don't flatten, tokenize! Unlocking the key to SoftMoE's efficacy in deep RL