M$^3$Prune: Hierarchical Communication Graph Pruning for Efficient Multi-Modal Multi-Agent Retrieval-Augmented Generation

📅 2025-11-25

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

To address excessive communication overhead, high token consumption, and poor scalability in multimodal multi-agent retrieval-augmented generation (RAG) systems, this paper proposes a hierarchical communication graph pruning framework. It pioneers the application of hierarchical graph pruning to multi-agent coordination, adaptively identifying and preserving critical communication pathways via intra-modal sparsification and cross-modal dynamic topology construction. Integrated with multimodal large language models and external knowledge retrieval, the framework employs a progressive pruning strategy that significantly reduces redundant agent interactions while preserving collaborative performance. Experimental results demonstrate that our method consistently outperforms both single-agent baselines and state-of-the-art multi-agent RAG systems on general and domain-specific benchmarks, achieving an average 32.7% reduction in token consumption and a 2.1× speedup in inference latency. This work establishes a novel paradigm for efficient, scalable multimodal multi-agent RAG.

Technology Category

Application Category

📝 Abstract

Recent advancements in multi-modal retrieval-augmented generation (mRAG), which enhance multi-modal large language models (MLLMs) with external knowledge, have demonstrated that the collective intelligence of multiple agents can significantly outperform a single model through effective communication. Despite impressive performance, existing multi-agent systems inherently incur substantial token overhead and increased computational costs, posing challenges for large-scale deployment. To address these issues, we propose a novel Multi-Modal Multi-agent hierarchical communication graph PRUNING framework, termed M$^3$Prune. Our framework eliminates redundant edges across different modalities, achieving an optimal balance between task performance and token overhead. Specifically, M$^3$Prune first applies intra-modal graph sparsification to textual and visual modalities, identifying the edges most critical for solving the task. Subsequently, we construct a dynamic communication topology using these key edges for inter-modal graph sparsification. Finally, we progressively prune redundant edges to obtain a more efficient and hierarchical topology. Extensive experiments on both general and domain-specific mRAG benchmarks demonstrate that our method consistently outperforms both single-agent and robust multi-agent mRAG systems while significantly reducing token consumption.

Problem

Research questions and friction points this paper is trying to address.

Reducing token overhead in multi-modal multi-agent retrieval-augmented generation systems

Optimizing computational efficiency while maintaining task performance across modalities

Pruning redundant communication edges between textual and visual agent networks

Innovation

Methods, ideas, or system contributions that make the work stand out.

Hierarchical pruning of multi-modal communication graphs

Intra-modal sparsification for critical edge identification

Dynamic topology construction with inter-modal sparsification

🔎 Similar Papers

GRAG: Graph Retrieval-Augmented Generation

2024-05-26North American Chapter of the Association for Computational LinguisticsCitations: 31

Bridging Training and Execution via Dynamic Directed Graph-Based Communication in Cooperative Multi-Agent Systems

2024-08-14arXiv.orgCitations: 1

Authors to Follow