🤖 AI Summary
Large language models (LLMs) suffer from performance degradation and reasoning homogenization when trained on self-generated data. Method: This paper proposes a multi-agent LLM society framework: multiple heterogeneous agents are initialized from the same base model; they collaboratively generate diverse synthetic data through cross-agent interaction and undergo decentralized, specialized fine-tuning. Contribution/Results: We introduce, for the first time, a coupling of multi-agent division-of-labor with distributed data generation, augmented by a chain-of-thought (CoT) diversity preservation mechanism—overcoming convergence limitations inherent in single-agent self-improvement. Experiments demonstrate an average 11.3% accuracy gain across diverse reasoning benchmarks, a 42% increase in CoT diversity, and stable performance over数十 rounds of continuous fine-tuning without degradation.
📝 Abstract
Large language models (LLMs) have achieved remarkable performance in recent years but are fundamentally limited by the underlying training data. To improve models beyond the training data, recent works have explored how LLMs can be used to generate synthetic data for autonomous self-improvement. However, successive steps of self-improvement can reach a point of diminishing returns. In this work, we propose a complementary approach towards self-improvement where finetuning is applied to a multiagent society of language models. A group of language models, all starting from the same base model, are independently specialized by updating each one using data generated through multiagent interactions among the models. By training each model on independent sets of data, we illustrate how this approach enables specialization across models and diversification over the set of models. As a result, our overall system is able to preserve diverse reasoning chains and autonomously improve over many more rounds of fine-tuning than single-agent self-improvement methods. We quantitatively illustrate the efficacy of the approach across a wide suite of reasoning tasks.