🤖 AI Summary
To address key bottlenecks in human-robot collaboration—namely low prediction accuracy, strong data dependency, poor robustness, and limited adaptability—this paper proposes ROTE, a novel framework that models human social behavior as executable programs and treats interactive behaviors as programmable “scripts.” Leveraging large language models (LLMs), ROTE generates a structured hypothesis space of behavioral programs and integrates probabilistic inference to quantify uncertainty—thereby relaxing restrictive rationality assumptions and eliminating reliance on large-scale annotated datasets. Evaluated on grid-world tasks and a large-scale embodied home simulator, ROTE achieves high-precision human behavior prediction from sparse observations alone. It improves in-distribution accuracy and cross-scenario generalization by 50% over state-of-the-art baselines, significantly enhancing the robustness and safety of human-robot collaborative systems.
📝 Abstract
Accurate prediction of human behavior is essential for robust and safe human-AI collaboration. However, existing approaches for modeling people are often data-hungry and brittle because they either make unrealistic assumptions about rationality or are too computationally demanding to adapt rapidly. Our key insight is that many everyday social interactions may follow predictable patterns; efficient "scripts" that minimize cognitive load for actors and observers, e.g., "wait for the green light, then go." We propose modeling these routines as behavioral programs instantiated in computer code rather than policies conditioned on beliefs and desires. We introduce ROTE, a novel algorithm that leverages both large language models (LLMs) for synthesizing a hypothesis space of behavioral programs, and probabilistic inference for reasoning about uncertainty over that space. We test ROTE in a suite of gridworld tasks and a large-scale embodied household simulator. ROTE predicts human and AI behaviors from sparse observations, outperforming competitive baselines -- including behavior cloning and LLM-based methods -- by as much as 50% in terms of in-sample accuracy and out-of-sample generalization. By treating action understanding as a program synthesis problem, ROTE opens a path for AI systems to efficiently and effectively predict human behavior in the real-world.