Expressive Temporal Specifications for Reward Monitoring

📅 2025-11-16

📈 Citations: 0

✨ Influential: 0

career value

195K/year

🤖 AI Summary

Sparse and low-informative reward functions severely hinder training efficiency in long-horizon reinforcement learning tasks. To address this, we propose a runtime reward monitor grounded in quantitative Linear Temporal Logic over finite traces (LTL_f[F]), the first application of LTL_f[F] to reward synthesis. This approach overcomes the dual limitations of weak expressivity and reward sparsity inherent in conventional Boolean semantics, while enabling modeling of non-Markovian properties. Our method employs a state-labeling-function-driven, algorithm-agnostic framework that automatically synthesizes dense, interpretable, quantitative reward signals. Evaluated across diverse long-horizon decision-making benchmarks, it achieves substantial improvements in task success rates and accelerates convergence by an average of 37%—consistently outperforming Boolean-reward baselines.

Technology Category

Application Category

📝 Abstract

Specifying informative and dense reward functions remains a pivotal challenge in Reinforcement Learning, as it directly affects the efficiency of agent training. In this work, we harness the expressive power of quantitative Linear Temporal Logic on finite traces (($ ext{LTL}_f[mathcal{F}]$)) to synthesize reward monitors that generate a dense stream of rewards for runtime-observable state trajectories. By providing nuanced feedback during training, these monitors guide agents toward optimal behaviour and help mitigate the well-known issue of sparse rewards under long-horizon decision making, which arises under the Boolean semantics dominating the current literature. Our framework is algorithm-agnostic and only relies on a state labelling function, and naturally accommodates specifying non-Markovian properties. Empirical results show that our quantitative monitors consistently subsume and, depending on the environment, outperform Boolean monitors in maximizing a quantitative measure of task completion and in reducing convergence time.

Problem

Research questions and friction points this paper is trying to address.

Addressing sparse reward challenges in Reinforcement Learning training

Using temporal logic to synthesize dense reward monitors

Improving task completion metrics and convergence time efficiency

Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses quantitative Linear Temporal Logic on finite traces

Synthesizes dense reward monitors for state trajectories

Algorithm-agnostic framework accommodating non-Markovian properties

🔎 Similar Papers

Reward Machines for Deep RL in Noisy and Uncertain Environments