Online inductive learning from answer sets for efficient reinforcement learning exploration

📅 2025-01-13
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Reinforcement learning (RL) suffers from inefficient exploration and poor policy interpretability in complex environments. This paper proposes a novel online integration framework combining inductive logic programming (ILP) with RL: during Q-learning, answer set programming (ASP) rules are incrementally learned from noisy experience to approximate the policy in an interpretable logical form, which subsequently guides exploration. To our knowledge, this is the first approach achieving online, closed-loop ILP–ASP learning without reward shaping; a soft bias mechanism ensures asymptotic policy optimality. Evaluated on two Pac-Man maps, the method significantly improves discounted return—yielding gains from the first training episode—while ASP rules converge rapidly. Crucially, it imposes no additional computational overhead on Q-learning, thus preserving real-time performance while simultaneously enhancing interpretability and convergence behavior.

Technology Category

Application Category

📝 Abstract
This paper presents a novel approach combining inductive logic programming with reinforcement learning to improve training performance and explainability. We exploit inductive learning of answer set programs from noisy examples to learn a set of logical rules representing an explainable approximation of the agent policy at each batch of experience. We then perform answer set reasoning on the learned rules to guide the exploration of the learning agent at the next batch, without requiring inefficient reward shaping and preserving optimality with soft bias. The entire procedure is conducted during the online execution of the reinforcement learning algorithm. We preliminarily validate the efficacy of our approach by integrating it into the Q-learning algorithm for the Pac-Man scenario in two maps of increasing complexity. Our methodology produces a significant boost in the discounted return achieved by the agent, even in the first batches of training. Moreover, inductive learning does not compromise the computational time required by Q-learning and learned rules quickly converge to an explanation of the agent policy.
Problem

Research questions and friction points this paper is trying to address.

Reinforcement Learning
Exploration Efficiency
Complex Environments
Innovation

Methods, ideas, or system contributions that make the work stand out.

Inductive Logic Programming
Reinforcement Learning
Pac-Man Game
🔎 Similar Papers
No similar papers found.