Modeling the Centaur: Human-Machine Synergy in Sequential Decision Making

📅 2024-12-24

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

In continuous-decision games (e.g., Go), human–AI collaboration without explicit communication faces the challenge of accurately assessing each other’s relative capabilities due to the absence of shared knowledge. Method: We propose an unsupervised reinforcement learning framework based on a Mixture-of-Experts (MoE) architecture that decouples advantage identification from joint policy optimization. Contribution/Results: Experiments reveal that pretrained expert models suffer from “knowledge curse,” whereas prior-free RL agents (e.g., Leela) significantly outperform both human experts and behavior-cloning models (e.g., Maia) in advantage identification. Our method demonstrates robustness in asymmetric human–AI teams. Crucially, this work provides the first empirical evidence that purely behavioral-data-driven advantage identification enables efficient human–AI collaboration—establishing a novel paradigm for implicit, communication-free cooperative decision-making.

Technology Category

Application Category

📝 Abstract

The field of collective intelligence studies how teams can achieve better results than any of the team members alone. The special case of human-machine teams carries unique challenges in this regard. For example, human teams often achieve synergy by communicating to discover their relative advantages, which is not an option if the team partner is an unexplainable deep neural network. Between 2005-2008 a set of"freestyle"chess tournaments were held, in which human-machine teams known as"centaurs", outperformed the best humans and best machines alone. Centaur players reported that they identified relative advantages between themselves and their chess program, even though the program was superhuman. Inspired by this and leveraging recent open-source models, we study human-machine like teams in chess. A human behavioral clone ("Maia") and a pure self-play RL-trained chess engine ("Leela") were composed into a team using a Mixture of Experts (MoE) architecture. By directing our research question at the selection mechanism of the MoE, we could isolate the issue of extracting relative advantages without knowledge sharing. We show that in principle, there is high potential for synergy between human and machine in a complex sequential decision environment such as chess. Furthermore, we show that an expert can identify only a small part of these relative advantages, and that the contribution of its subject matter expertise in doing so saturates quickly. This is probably due to the"curse of knowledge"phenomenon. We also train a network to recognize relative advantages using reinforcement learning, without chess expertise, and it outdoes the expert. Our experiments are repeated in asymmetric teams, in which identifying relative advantages is more challenging. Our findings contribute to the study of collective intelligence and human-centric AI.

Problem

Research questions and friction points this paper is trying to address.

Team Cooperation

Continuous Decision-making

Advantage Exploitation

Innovation

Methods, ideas, or system contributions that make the work stand out.

Hybrid Expert Approach

Machine Learning Advantage Recognition

Human-AI Collaboration in Games

🔎 Similar Papers

No similar papers found.

Authors to Follow