RMoA: Optimizing Mixture-of-Agents through Diversity Maximization and Residual Compensation

📅 2025-05-30
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address high computational overhead, information decay, and insufficient robustness in multi-large-language-model (LLM) agent collaborative reasoning, this paper proposes the Residual Mixture-of-Experts architecture (RMoA). Methodologically, RMoA introduces: (1) an embedding-space diversity-aware greedy selection mechanism to enhance response diversity; (2) a dual-agent structure with cross-layer residual extraction and aggregation to mitigate information loss; and (3) an adaptive termination strategy guided by a residual convergence criterion to eliminate redundant computation. Empirically, RMoA achieves state-of-the-art performance across alignment, mathematical reasoning, code generation, and multi-task understanding benchmarks. It reduces average inference FLOPs by 37%, significantly lowering computational cost while improving system robustness. All source code is publicly available.

Technology Category

Application Category

📝 Abstract
Although multi-agent systems based on large language models show strong capabilities on multiple tasks, they are still limited by high computational overhead, information loss, and robustness. Inspired by ResNet's residual learning, we propose Residual Mixture-of-Agents (RMoA), integrating residual connections to optimize efficiency and reliability. To maximize information utilization from model responses while minimizing computational costs, we innovatively design an embedding-based diversity selection mechanism that greedily selects responses via vector similarity. Furthermore, to mitigate iterative information degradation, we introduce a Residual Extraction Agent to preserve cross-layer incremental information by capturing inter-layer response differences, coupled with a Residual Aggregation Agent for hierarchical information integration. Additionally, we propose an adaptive termination mechanism that dynamically halts processing based on residual convergence, further improving inference efficiency. RMoA achieves state-of-the-art performance on the benchmarks of across alignment, mathematical reasoning, code generation, and multitasking understanding, while significantly reducing computational overhead. Code is available at https://github.com/mindhunter01/RMoA.
Problem

Research questions and friction points this paper is trying to address.

Optimizing multi-agent systems for efficiency and reliability
Reducing computational overhead in large language models
Mitigating information loss and improving robustness
Innovation

Methods, ideas, or system contributions that make the work stand out.

Residual connections optimize efficiency and reliability
Embedding-based diversity selection minimizes computational costs
Adaptive termination improves inference efficiency dynamically
🔎 Similar Papers
No similar papers found.