🤖 AI Summary
Fixed-length skills in long-horizon complex tasks often skip critical decision points, hindering effective policy learning. Method: We propose a dynamic skill termination mechanism grounded in state-action novelty, the first to incorporate state-action pair novelty modeling into skill termination criteria—requiring no task-specific priors or supervision, and enabling environment-agnostic, robust identification of adaptive decision points. Technically, our approach integrates a novelty assessment module, experience-driven termination policy learning, and unsupervised skill segmentation. Contribution/Results: Experiments demonstrate significant performance gains over state-of-the-art baselines across multiple long-horizon benchmark tasks. Moreover, our method exhibits strong generalization under substantial environmental configuration shifts and consistently accelerates policy learning.
📝 Abstract
Intelligent agents are able to make decisions based on different levels of granularity and duration. Recent advances in skill learning enabled the agent to solve complex, long-horizon tasks by effectively guiding the agent in choosing appropriate skills. However, the practice of using fixed-length skills can easily result in skipping valuable decision points, which ultimately limits the potential for further exploration and faster policy learning. In this work, we propose to learn a simple and efficient termination condition that identifies decision points through a state-action novelty module that leverages agent experience data. Our approach, Novelty-based Decision Point Identification (NBDI), outperforms previous baselines in complex, long-horizon tasks, and remains effective even in the presence of significant variations in the environment configurations of downstream tasks, highlighting the importance of decision point identification in skill learning.