🤖 AI Summary
To address low sample efficiency, poor robustness, and limited interpretability in reinforcement learning, this paper proposes a cognitive belief-driven Q-learning framework. Methodologically, it explicitly incorporates human-like intuitive subjective beliefs into Q-learning for the first time, introducing a clustering-based representation of subjective belief distributions and integrating Bayesian belief updating to enable joint reasoning over historical experience and current context—thereby effectively mitigating Q-value overestimation. The end-to-end framework synergistically unifies principles from cognitive science, clustering-based representation learning, and classical Q-learning. Empirical evaluation across diverse complex discrete-control tasks demonstrates substantial improvements in policy robustness and environmental adaptability; decision-making behavior exhibits greater alignment with human intuition. The framework consistently outperforms state-of-the-art Q-learning variants across all major performance metrics.
📝 Abstract
Reinforcement learning encounters challenges in various environments related to robustness and explainability. Traditional Q-learning algorithms cannot effectively make decisions and utilize the historical learning experience. To overcome these limitations, we propose Cognitive Belief-Driven Q-Learning (CBDQ), which integrates subjective belief modeling into the Q-learning framework, enhancing decision-making accuracy by endowing agents with human-like learning and reasoning capabilities. Drawing inspiration from cognitive science, our method maintains a subjective belief distribution over the expectation of actions, leveraging a cluster-based subjective belief model that enables agents to reason about the potential probability associated with each decision. CBDQ effectively mitigates overestimated phenomena and optimizes decision-making policies by integrating historical experiences with current contextual information, mimicking the dynamics of human decision-making. We evaluate the proposed method on discrete control benchmark tasks in various complicate environments. The results demonstrate that CBDQ exhibits stronger adaptability, robustness, and human-like characteristics in handling these environments, outperforming other baselines. We hope this work will give researchers a fresh perspective on understanding and explaining Q-learning.