🤖 AI Summary
Automated summarization of GitHub repository README.md files remains challenging due to their heterogeneous structure, domain-specific content, and varying quality.
Method: This paper proposes MetaGente, a multi-LLM collaborative framework featuring role-specialized agents—Writer, Reviewer, and Teacher—that jointly perform iterative prompt refinement, multi-perspective result aggregation, and unified teacher-led evaluation. It establishes the first self-optimizing, multi-agent LLM paradigm for README summarization.
Contribution/Results: MetaGente significantly improves summary accuracy and robustness, outperforming GitSum by 27.63%–60.43% on standard benchmarks and surpassing both LLaMA-2 and GPT-4o, while requiring only minimal fine-tuning data. Its core innovation lies in modeling multi-LLM collaboration as an evolvable feedback-closed loop system, yielding an efficient, interpretable, and lightweight solution for code documentation understanding.
📝 Abstract
The proliferation of Large Language Models (LLMs) in recent years has realized many applications in various domains. Being trained with a huge of amount of data coming from various sources, LLMs can be deployed to solve different tasks, including those in Software Engineering (SE). Though they have been widely adopted, the potential of using LLMs cooperatively has not been thoroughly investigated. In this paper, we proposed Metagente as a novel approach to amplify the synergy of various LLMs. Metagente is a Multi-Agent framework based on a series of LLMs to self-optimize the system through evaluation, feedback, and cooperation among specialized agents. Such a framework creates an environment where multiple agents iteratively refine and optimize prompts from various perspectives. The results of these explorations are then reviewed and aggregated by a teacher agent. To study its performance, we evaluated Metagente with an SE task, i.e., summarization of README.MD files, and compared it with three well-established baselines, i.e., GitSum, LLaMA-2, and GPT-4o. The results show that our proposed approach works efficiently and effectively, consuming a small amount of data for fine-tuning but still getting a high accuracy, thus substantially outperforming the baselines. The performance gain compared to GitSum, the most relevant benchmark, ranges from 27.63% to 60.43%. More importantly, compared to using only one LLM, Metagente boots up the accuracy to multiple folds.