🤖 AI Summary
This study investigates whether sharing peer experiences in multi-agent environments can surpass the limitations of isolated self-evolution. To this end, the authors propose the SAGE evaluation framework to systematically compare SocialEvo—agents with access to population-wide historical data—and SelfEvo—agents restricted to their own evolutionary trajectories—under matched computational budgets. Through coordinated co-evolution across multiple model families, controlled experiments, and trajectory abstraction techniques such as filtering and reflective summarization, the framework is evaluated across diverse tasks including machine learning research, economic planning, and adversarial games. The findings reveal that top-performing agents struggle to transcend their individual evolutionary ceilings, whereas agents stuck in performance plateaus benefit significantly from peer experiences. These gains are agent-specific and context-dependent, and abstracted shared histories consistently outperform raw logs, demonstrating generalizable improvements rather than opponent-specific adaptations.
📝 Abstract
Self-improving language agents are typically evaluated in isolation: an agent attempts a task, receives feedback, and iteratively refines its own behavior. Yet agents increasingly operate alongside peers whose strategies and outcomes are publicly visible. This raises an under-studied question: when does shared experience produce improvements that self-improvement alone cannot achieve? We introduce SAGE (Social Agent Group Evolution),an evaluation framework that compares two compute-matched conditions: SocialEvo, where agents from five distinct model families co-evolve with access to all peers' histories; and SelfEvo, where each agent receives the same number of task attempts but sees only its own past, which is conventional in self-improving agent studies. We instantiate SAGE in three arenas: open-ended ML research, long-horizon economic planning, and strategic multiplayer play, evaluated across multiple evolutionary rounds. We find that group history is not a universal amplifier: the strongest agent does not exceed its self-evolution ceiling. However, agents that plateau under self-improvement can achieve significant breakthroughs when peer experience is available. In competitive settings, counterfactual controls reveal that agents improve generally rather than developing opponent-specific strategies. Across different forms of shared history, filtered peer traces and reflective summaries often outperform raw logs, indicating that social gains depend on abstraction rather than exposure volume. These findings reveal that peer-history gains are agent-specific, arena-dependent, and contingent on the capacity to abstract transferable knowledge from public traces.