π€ AI Summary
Current evaluations of multi-agent systems powered by large language models predominantly emphasize task outcomes or individual agent capabilities, often overlooking core collaborative competencies such as establishing consensus under constrained communication, maintaining mutual understanding, aligning individual and collective goals, and repairing misalignments. To address this gap, this work introduces CollabSim, a novel framework that integrates Computer-Supported Cooperative Work (CSCW) theory into the assessment of multi-agent collaboration for the first time. CollabSim provides a configurable simulation environment, defines collaboration dimensions grounded in CSCW principles, and employs action-level probes to analyze agentsβ internal states. Experiments across four large language models demonstrate that CollabSim enables fine-grained, condition-controlled evaluation of collaborative abilities, effectively uncovering the impact of interaction conditions, inter-model differences, and the task-dependence of agent design on collaborative performance.
π Abstract
Multi-agent systems (MAS) built on large language models have shown growing promise, with their effectiveness resting on agents' ability to coordinate through text-based channels much as human teams do. Yet recent study suggests that MAS often falter not because agents lack individual task-solving ability, but because they lack collaborative competence: the capacity to establish common ground, maintain shared task understanding, balance individual and collective incentives, and repair misalignment as interaction unfolds. Decades of research in Computer-Supported Cooperative Work have characterized these requirements for human teams coordinating under constrained communication, yet existing MAS evaluations focus mainly on task outcomes or single-agent proficiency in reasoning, planning, and tool use. To enable a systematic analysis of agents' collaborative competence in MAS, we introduce CollabSim, a configurable simulation framework that combines a theory-grounded definition of collaborative capabilities, controlled manipulation of interaction conditions, and action-level probing of agents' internal states. Experiments across four LLMs show that CollabSim can capture condition effects, separate model performance patterns, and reveal task-dependent effects of agent design.