🤖 AI Summary
This work addresses the challenge of ensuring that reinforcement learning (RL) policies satisfy formal ω-regular specifications, particularly in infinite or continuous state spaces where guarantees are difficult to obtain. It establishes, for the first time, a theoretical connection between RL value functions and Streett supermartingale certificates. By designing a tailored reward mechanism, the value function induced by any policy satisfying an ω-regular property—expressed in linear temporal logic—naturally serves as a formal verification certificate. This approach bridges formal verification of stochastic systems with RL theory, providing a principled framework for synthesizing verifiably correct policies in general state spaces. The method is validated on finite Markov decision processes, and the theoretical results extend to countably infinite and continuous state spaces.
📝 Abstract
Certification methods for stochastic systems provide sufficient proof rules, based on real-valued supermartingale certificates, to determine the almost-sure satisfaction of $ω$-regular properties (and therefore of linear temporal logic) over general state spaces, encompassing both countably infinite and continuous state spaces. Conversely, reinforcement learning (RL) methods for $ω$-regular tasks have received considerable attention, but they typically lack formal guarantees that the learned policy satisfies the specification, except possibly for finite state and action spaces. We bridge these two lines of research by establishing a novel theoretical connection: under an appropriate reward, the value function associated to a policy that almost surely satisfies an $ω$-regular property encodes a Streett supermartingale certificate for that specification. Our results, validated experimentally on finite Markov decision processes, hold for finite, countably infinite, and continuous state spaces, suggesting a principled route to certificate synthesis via RL.