Reinforcement Learning with Symbolic Reward Machines

📅 2026-03-03

📈 Citations: 0

✨ Influential: 0

career value

252K/year

🤖 AI Summary

This work proposes Symbolic Reward Machines (SRMs) along with two learning algorithms, QSRM and LSRM, to address the limitations of traditional reward machines (RMs), which rely on handcrafted environment-specific label functions and thus struggle to generalize within standard reinforcement learning frameworks. SRMs automatically construct guard conditions directly from raw observations using symbolic logical formulas, eliminating the need for manual labeling while effectively modeling non-Markovian tasks with sparse rewards. By removing dependence on predefined labels and maintaining compatibility with standard RL interfaces, the approach significantly enhances both the generality and interpretability of the learned policies. Empirical evaluations demonstrate that SRMs outperform existing reinforcement learning baselines across multiple benchmark environments and achieve task-completion performance comparable to that of conventional RMs.

Technology Category

Application Category

📝 Abstract

Reward Machines (RMs) are an established mechanism in Reinforcement Learning (RL) to represent and learn sparse, temporally extended tasks with non-Markovian rewards. RMs rely on high-level information in the form of labels that are emitted by the environment alongside the observation. However, this concept requires manual user input for each environment and task. The user has to create a suitable labeling function that computes the labels. These limitations lead to poor applicability in widely adopted RL frameworks. We propose Symbolic Reward Machines (SRMs) together with the learning algorithms QSRM and LSRM to overcome the limitations of RMs. SRMs consume only the standard output of the environment and process the observation directly through guards that are represented by symbolic formulas. In our evaluation, our SRM methods outperform the baseline RL approaches and generate the same results as the existing RM methods. At the same time, our methods adhere to the widely used environment definition and provide interpretable representations of the task to the user.

Problem

Research questions and friction points this paper is trying to address.

Reward Machines

Reinforcement Learning

Symbolic Representation

Non-Markovian Rewards

Labeling Function

Innovation

Methods, ideas, or system contributions that make the work stand out.

Symbolic Reward Machines

Reinforcement Learning

Non-Markovian Rewards