Neuro-Symbolic Injection of LTLf Constraints in Autoregressive Reinforcement Learning Policies

📅 2026-06-06

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

This work addresses the challenge of effectively incorporating high-level temporal constraints—specified in finite-trace linear temporal logic (LTLf)—into offline reinforcement learning. The authors propose a neuro-symbolic approach that compiles LTLf specifications into deterministic finite automata (DFAs) and leverages a differentiable representation of the DFA to generate progress signals. These signals are integrated as a logical loss regularizer within an autoregressive policy learning framework. Notably, this method enables architecture-agnostic injection of formal specifications into a Transformer-based offline RL setting for the first time. Empirical results on navigation tasks demonstrate a significant improvement in constraint satisfaction rates while maintaining cumulative returns comparable to baseline methods.

📝 Abstract

In this work we study offline reinforcement learning (RL) under temporally extended task constraints expressed in Linear Temporal Logic over finite traces (LTLf). Recently, transformer-based approaches such as Trajectory Transformers and Decision Transformers have been adopted to address RL as a sequence modeling problem. However, these methods optimize purely for reward and do not account for high-level temporal requirements. Here, we introduce a neurosymbolic framework that injects LTLf background knowledge into such transformer-based RL policies. Our approach compiles LTLf formulas into deterministic finite automata (DFAs) and integrates them into the learning process through a differentiable representation and a logic-based loss function. In particular, we derive differentiable satisfaction signals from DFA progression and use them as a regularization term during training. The resulting method is architecture-agnostic across different models. We evaluate the proposed framework on navigation environments with specification suites covering combinations of safety and reachability temporal properties. Experimental results show that incorporating background knowledge not only improves constraint satisfaction, but also maintains competitive return compared to vanilla baselines.

Problem

Research questions and friction points this paper is trying to address.

offline reinforcement learning

temporal constraints

LTLf

sequence modeling

autoregressive policies

Innovation

Methods, ideas, or system contributions that make the work stand out.

Neuro-Symbolic Integration

LTLf Constraints

Differentiable DFA