Modeling the Centaur: Human-Machine Synergy in Sequential Decision Making

📅 2024-12-24
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
In continuous-decision games (e.g., Go), human–AI collaboration without explicit communication faces the challenge of accurately assessing each other’s relative capabilities due to the absence of shared knowledge. Method: We propose an unsupervised reinforcement learning framework based on a Mixture-of-Experts (MoE) architecture that decouples advantage identification from joint policy optimization. Contribution/Results: Experiments reveal that pretrained expert models suffer from “knowledge curse,” whereas prior-free RL agents (e.g., Leela) significantly outperform both human experts and behavior-cloning models (e.g., Maia) in advantage identification. Our method demonstrates robustness in asymmetric human–AI teams. Crucially, this work provides the first empirical evidence that purely behavioral-data-driven advantage identification enables efficient human–AI collaboration—establishing a novel paradigm for implicit, communication-free cooperative decision-making.

Technology Category

Application Category

📝 Abstract
The field of collective intelligence studies how teams can achieve better results than any of the team members alone. The special case of human-machine teams carries unique challenges in this regard. For example, human teams often achieve synergy by communicating to discover their relative advantages, which is not an option if the team partner is an unexplainable deep neural network. Between 2005-2008 a set of"freestyle"chess tournaments were held, in which human-machine teams known as"centaurs", outperformed the best humans and best machines alone. Centaur players reported that they identified relative advantages between themselves and their chess program, even though the program was superhuman. Inspired by this and leveraging recent open-source models, we study human-machine like teams in chess. A human behavioral clone ("Maia") and a pure self-play RL-trained chess engine ("Leela") were composed into a team using a Mixture of Experts (MoE) architecture. By directing our research question at the selection mechanism of the MoE, we could isolate the issue of extracting relative advantages without knowledge sharing. We show that in principle, there is high potential for synergy between human and machine in a complex sequential decision environment such as chess. Furthermore, we show that an expert can identify only a small part of these relative advantages, and that the contribution of its subject matter expertise in doing so saturates quickly. This is probably due to the"curse of knowledge"phenomenon. We also train a network to recognize relative advantages using reinforcement learning, without chess expertise, and it outdoes the expert. Our experiments are repeated in asymmetric teams, in which identifying relative advantages is more challenging. Our findings contribute to the study of collective intelligence and human-centric AI.
Problem

Research questions and friction points this paper is trying to address.

Team Cooperation
Continuous Decision-making
Advantage Exploitation
Innovation

Methods, ideas, or system contributions that make the work stand out.

Hybrid Expert Approach
Machine Learning Advantage Recognition
Human-AI Collaboration in Games
🔎 Similar Papers
No similar papers found.
D
David Shoresh
The Edmond and Lily Safra Center for Brain Sciences, Department of Cognitive Sciences, Hebrew University, Jerusalem, Israel
Yonatan Loewenstein
Yonatan Loewenstein
The Hebrew University of Jerusalem