Zero-Shot Instruction Following in RL via Structured LTL Representations

📅 2025-12-02

📈 Citations: 0

✨ Influential: 0

career value

192K/year

🤖 AI Summary

Reinforcement learning agents struggle to zero-shot follow arbitrary Linear Temporal Logic (LTL) instructions in environments with concurrent, interdependent atomic propositions. Method: We propose a novel multi-task policy learning framework that explicitly encodes the transition logic of finite automata as Boolean formula sequences and employs graph neural networks to model structured dependencies among events, thereby generating generalizable task representations. Contribution/Results: Our approach eliminates reliance on task-specific automaton construction and retraining, enabling true zero-shot transfer across the LTL instruction space. Experiments in a high-complexity, multi-event chess-based environment demonstrate significant improvements over state-of-the-art baselines, achieving superior generalization performance and robustness in zero-shot LTL instruction following.

Technology Category

Application Category

📝 Abstract

Linear temporal logic (LTL) is a compelling framework for specifying complex, structured tasks for reinforcement learning (RL) agents. Recent work has shown that interpreting LTL instructions as finite automata, which can be seen as high-level programs monitoring task progress, enables learning a single generalist policy capable of executing arbitrary instructions at test time. However, existing approaches fall short in environments where multiple high-level events (i.e., atomic propositions) can be true at the same time and potentially interact in complicated ways. In this work, we propose a novel approach to learning a multi-task policy for following arbitrary LTL instructions that addresses this shortcoming. Our method conditions the policy on sequences of simple Boolean formulae, which directly align with transitions in the automaton, and are encoded via a graph neural network (GNN) to yield structured task representations. Experiments in a complex chess-based environment demonstrate the advantages of our approach.

Problem

Research questions and friction points this paper is trying to address.

Enhances RL agents' ability to follow arbitrary LTL instructions

Addresses limitations in environments with concurrent interacting events

Uses structured Boolean formulas and GNNs for multi-task policy learning

Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses Boolean formulas for automaton transitions

Encodes tasks via graph neural network

Handles simultaneous high-level event interactions

🔎 Similar Papers

Sketch-Plan-Generalize: Continual Few-Shot Learning of Inductively Generalizable Spatial Concepts