CollabBench: Benchmarking and Unleashing Collaborative Ability of LLMs with Diverse Players via Proactive Engagement

📅 2026-06-04
📈 Citations: 0
Influential: 0
📄 PDF

career value

194K/year
🤖 AI Summary
Large language models still face significant challenges in embodied interaction and behavioral execution assessment within authentic, immersive human-agent collaboration. To address this, this work proposes CollabBench, the first collaborative agent training and evaluation framework that integrates diverse player personas, active participation mechanisms, and hybrid reward structures, thereby transcending the limitations of conventional dialogue-level collaboration. Built upon extended CWAH-MultiPlayer and Cook-MultiPlayer environments, the framework unifies reasoning, communication, and action through a behavior simulation pipeline, agent rollout training, and a hybrid reward mechanism. Experimental results demonstrate that the proposed approach improves task efficiency and emotional adaptability by 19.5% and 24.4%, respectively, substantially outperforming baseline methods, while also revealing critical deficiencies of current large models in collaborative settings.
📝 Abstract
While LLM-based agents excel at individual tasks, effective collaboration with realistic human partners remains challenging. Most of the existing conversation-level collaborative studies lack grounded interaction and behavioral execution, motivating the need for cooperative game environments that enable contextualized and immersive collaboration. To this end, this paper proposes CollabBench, a benchmark for evaluating and training collaborative agents in cooperative games. CollabBench features a Diverse Player Profile Simulation pipeline to model varied players behaviors, and a Collaborative Agentic Training paradigm that unifies reasoning, communication, and action via agentic rollouts, optimized with a hybrid reward balancing task efficiency and affective adaptation. We further extend classic environments to CWAH-MultiPlayer and Cook-MultiPlayer for systematic evaluation under diverse personalities. Experiments with efficiency and affective metrics show that our trained models outperform base models, achieving 19.5% higher efficiency and 24.4% improved affective performance. Further analysis reveals key collaborative limitations of existing models and offers insights for future collaborative training.
Problem

Research questions and friction points this paper is trying to address.

collaborative ability
large language models
human-AI collaboration
cooperative games
diverse player behaviors
Innovation

Methods, ideas, or system contributions that make the work stand out.

Collaborative Benchmarking
Diverse Player Simulation
Agentic Rollouts
Hybrid Reward Optimization
Multiplayer Cooperative Games
Hong Qian
Hong Qian
East China Normal University
Artificial IntelligenceMachine LearningEvolutionary Optimization
Yuanhao Liu
Yuanhao Liu
Institute of Computing Technology, Chinese Academy of Sciences
trustworthy AIfairness of algorithms
Zihan Zhou
Zihan Zhou
South China University of Technology
Computer Vision,Image Processing,Deep Learning
Z
Zongbao Zhang
Shanghai Institute of AI for Education, and School of Computer Science and Technology, East China Normal University, Shanghai, China; Shanghai Innovation Institute, Shanghai, China
H
Hanjie Ge
Shanghai Institute of AI for Education, and School of Computer Science and Technology, East China Normal University, Shanghai, China
H
Haotian Shi
Shanghai Institute of AI for Education, and School of Computer Science and Technology, East China Normal University, Shanghai, China
L
Liang Dou
Shanghai Institute of AI for Education, and School of Computer Science and Technology, East China Normal University, Shanghai, China
X
Xiangfeng Wang
Shanghai Institute of AI for Education, and School of Computer Science and Technology, East China Normal University, Shanghai, China
J
Jingwen Yang
Tencent Inc., Shenzhen, China
A
Aimin Zhou
Shanghai Institute of AI for Education, and School of Computer Science and Technology, East China Normal University, Shanghai, China; Shanghai Innovation Institute, Shanghai, China