🤖 AI Summary
Reinforcement learning agents often lack interpretability and reliability due to behavioral discrepancies from humans. To address this, this work proposes Hierarchical Macro-Action Quantization (HiMAQ), a method that encodes human demonstrations into structured behavioral units through a two-level vector quantization mechanism: first clustering fine-grained sub-actions and then aggregating them into high-level macro-actions. Experiments on the D4RL benchmark demonstrate that HiMAQ consistently enhances the human-likeness of agent behavior across multiple offline reinforcement learning algorithms—including IQL, SAC, and RLPD—while maintaining comparable or higher task success rates. HiMAQ outperforms its non-hierarchical counterpart, MAQ, exhibiting strong generalization capability and practical utility.
📝 Abstract
Human-like agents are a long-standing goal of artificial intelligence. Despite strong performance, most reinforcement learning (RL) agents remain reward-driven and often exhibit behaviors that differ from humans, limiting interpretability and reliability. In this work, we introduce a novel human-like RL framework that predicts action sequences closely aligned with human behaviors while maximizing rewards. Specifically, we encode human demonstrations into macro actions using a hierarchical macro action quantization approach (termed HiMAQ) consisting of two successive levels of vector quantization. The lower quantization level maps input actions to fine-grained subaction clusters, while the higher quantization level aggregates these subaction clusters into action clusters. Extensive evaluations on the D4RL benchmarks show that our hierarchical approach outperforms the non-hierarchical baseline (MAQ), achieving better human-likeness scores while maintaining comparable or better success rates than previous RL agents. The improvements generalize across integrations with various RL algorithms, namely IQL, SAC, and RLPD.