M$^3$Prune: Hierarchical Communication Graph Pruning for Efficient Multi-Modal Multi-Agent Retrieval-Augmented Generation

📅 2025-11-25
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address excessive communication overhead, high token consumption, and poor scalability in multimodal multi-agent retrieval-augmented generation (RAG) systems, this paper proposes a hierarchical communication graph pruning framework. It pioneers the application of hierarchical graph pruning to multi-agent coordination, adaptively identifying and preserving critical communication pathways via intra-modal sparsification and cross-modal dynamic topology construction. Integrated with multimodal large language models and external knowledge retrieval, the framework employs a progressive pruning strategy that significantly reduces redundant agent interactions while preserving collaborative performance. Experimental results demonstrate that our method consistently outperforms both single-agent baselines and state-of-the-art multi-agent RAG systems on general and domain-specific benchmarks, achieving an average 32.7% reduction in token consumption and a 2.1× speedup in inference latency. This work establishes a novel paradigm for efficient, scalable multimodal multi-agent RAG.

Technology Category

Application Category

📝 Abstract
Recent advancements in multi-modal retrieval-augmented generation (mRAG), which enhance multi-modal large language models (MLLMs) with external knowledge, have demonstrated that the collective intelligence of multiple agents can significantly outperform a single model through effective communication. Despite impressive performance, existing multi-agent systems inherently incur substantial token overhead and increased computational costs, posing challenges for large-scale deployment. To address these issues, we propose a novel Multi-Modal Multi-agent hierarchical communication graph PRUNING framework, termed M$^3$Prune. Our framework eliminates redundant edges across different modalities, achieving an optimal balance between task performance and token overhead. Specifically, M$^3$Prune first applies intra-modal graph sparsification to textual and visual modalities, identifying the edges most critical for solving the task. Subsequently, we construct a dynamic communication topology using these key edges for inter-modal graph sparsification. Finally, we progressively prune redundant edges to obtain a more efficient and hierarchical topology. Extensive experiments on both general and domain-specific mRAG benchmarks demonstrate that our method consistently outperforms both single-agent and robust multi-agent mRAG systems while significantly reducing token consumption.
Problem

Research questions and friction points this paper is trying to address.

Reducing token overhead in multi-modal multi-agent retrieval-augmented generation systems
Optimizing computational efficiency while maintaining task performance across modalities
Pruning redundant communication edges between textual and visual agent networks
Innovation

Methods, ideas, or system contributions that make the work stand out.

Hierarchical pruning of multi-modal communication graphs
Intra-modal sparsification for critical edge identification
Dynamic topology construction with inter-modal sparsification
🔎 Similar Papers
2024-05-26North American Chapter of the Association for Computational LinguisticsCitations: 31
W
Weizi Shao
East China Normal University, Shanghai, China
Taolin Zhang
Taolin Zhang
Hefei University of Technology
LLMVLLMDeep Learning
Z
Zijie Zhou
China University of Petroleum, Beijing, China
C
Chen Chen
Guangdong university of Finance & Economics, Guangdong, China
Chengyu Wang
Chengyu Wang
Alibaba Group
Natural Language ProcessingLarge Language ModelMulti-modal Learning
X
Xiaofeng He
East China Normal University, Shanghai, China