Humans' ALMANAC: A Human Collaboration Dataset of Action-Level Mental Model Annotations for Agent Collaboration

📅 2026-06-04

📈 Citations: 0

✨ Influential: 0

career value

222K/year

🤖 AI Summary

Current artificial agents struggle to align their own reasoning, their partner’s intentions, and shared goals during collaboration, primarily due to the absence of real-world human collaborative data annotated with action-level theory-of-mind labels. To address this gap, this work introduces ALMANAC, a dataset comprising 2,987 collaborative actions derived from the Map Task—a classic dyadic route-description paradigm in social psychology. ALMANAC provides the first large-scale, action-granular annotations of mental models, explicitly capturing self-reasoning, partner intent, and team objectives. Data collection adheres to established experimental protocols and incorporates structured annotations grounded in cognitive science theories. The dataset’s utility for evaluating large language models’ ability to simulate human collaborative behavior and infer underlying mental states is demonstrated through assessments with six state-of-the-art models, thereby filling a critical data void in developing agents with process-level collaborative capabilities.

📝 Abstract

Recent advances in LLM agents have enabled complex cognitive capabilities, such as multi-step reasoning, planning, and tool use, that increasingly position these agents as human collaborators. Effective collaboration, however, requires collaborators to continuously maintain and align mental models of their own reasoning,partners' intentions, and shared goals during the collaborative process. Today's agents rarely develop such capabilities since they are primarily optimized for task completion, and the community lacks authentic human collaboration data with action-level mental model annotations that could guide agents toward process-level collaborative competence. To bridge this gap, we present ALMANAC, a dataset of Action-Level Mental model ANnotations for Agent Collaboration built from the Map Task, a classic dyadic routing task from social science. ALMANAC contains 2,987 collaboration actions, each paired with theory-informed mental model annotations that record the participants' self-reasoning, perceived partner intent, and perceived team goal. We benchmark six LLMs on predicting humans' next-turn behavior and mental models. Our results demonstrate ALMANAC's utility in evaluating models' ability to simulate human collaborative behaviors and infer their underlying mental models.

Problem

Research questions and friction points this paper is trying to address.

mental model

human-agent collaboration

action-level annotation

collaborative competence

LLM agents

Innovation

Methods, ideas, or system contributions that make the work stand out.

mental model annotation

human-agent collaboration

action-level dataset

collaborative reasoning

large language models

🔎 Similar Papers

No similar papers found.