🤖 AI Summary
This work addresses the high coordination overhead and architectural design challenges in multi-agent systems by proposing a natural language–based approach for automated architecture evolution. The system topology and agent roles are encoded into iteratively optimizable natural language documents, enabling the automatic identification of missing expert roles and refinement of the architecture through comparison of execution trajectories. This method is the first to render architectural knowledge inspectable, revisable, and transferable across domains. Evaluated on SOPBench, it achieves a 70% validation pass rate and 65.96% test pass rate. Notably, a single cross-domain transfer matches the performance of three cold-start iterations, and the analysis reveals that multi-agent systems achieve superior performance over single-agent baselines despite operating at only 26% round efficiency—thereby effectively quantifying and mitigating the “coordination tax.”
📝 Abstract
How should multi-agent systems be designed, and can that design knowledge be captured in a form that is inspectable, revisable, and transferable? We introduce ABSTRAL, a framework that treats MAS architecture as an evolving natural-language document, an artifact refined through contrastive trace analysis. Three findings emerge. First, we provide a precise measurement of the multi-agent coordination tax: under fixed turn budgets, ensembles achieve only 26% turn efficiency, with 66% of tasks exhausting the limit, yet still improve over single-agent baselines by discovering parallelizable task decompositions. Second, design knowledge encoded in documents transfers: topology reasoning and role templates learned on one domain provide a head start on new domains, with transferred seeds matching coldstart iteration 3 performance in a single iteration. Third, contrastive trace analysis discovers specialist roles absent from any initial design, a capability no prior system demonstrates. On SOPBench (134 bank tasks, deterministic oracle), ABSTRAL reaches 70% validation / 65.96% test pass rate with a GPT-4o backbone. We release the converged documents as inspectable design rationale.