A New Perspective on Transformers in Online Reinforcement Learning for Continuous Control

📅 2025-10-15

📈 Citations: 0

✨ Influential: 0

career value

153K/year

🤖 AI Summary

Prior work has identified instability in Transformer-based online model-free reinforcement learning (RL), primarily due to high sensitivity to policy/value network architecture, parameter sharing schemes, and temporal modeling strategies. Method: This paper presents the first systematic study of Transformer design for online continuous control, proposing a stable and efficient Actor-Critic architecture featuring serialized state inputs, temporal slicing, cross-network parameter sharing, and conditional input conditioning—unified to support both vector and image observations. Contribution/Results: The proposed method significantly improves training stability and generalization across diverse online RL benchmarks. It achieves state-of-the-art performance on both fully observed (e.g., MuJoCo) and partially observed (e.g., DeepMind Control Suite with proprioceptive+visual inputs) tasks. By providing a reproducible architectural blueprint and empirically validated design principles, this work establishes a new paradigm and practical guidelines for deploying Transformers in online RL settings.

Technology Category

Application Category

📝 Abstract

Despite their effectiveness and popularity in offline or model-based reinforcement learning (RL), transformers remain underexplored in online model-free RL due to their sensitivity to training setups and model design decisions such as how to structure the policy and value networks, share components, or handle temporal information. In this paper, we show that transformers can be strong baselines for continuous control in online model-free RL. We investigate key design questions: how to condition inputs, share components between actor and critic, and slice sequential data for training. Our experiments reveal stable architectural and training strategies enabling competitive performance across fully and partially observable tasks, and in both vector- and image-based settings. These findings offer practical guidance for applying transformers in online RL.

Problem

Research questions and friction points this paper is trying to address.

Exploring transformer applications in online model-free reinforcement learning for control

Addressing architectural design challenges in actor-critic transformer networks

Developing stable training strategies for transformers in continuous control tasks

Innovation

Methods, ideas, or system contributions that make the work stand out.

Stable transformer architecture for online model-free RL

Shared actor-critic components with conditioned inputs

Sequential data slicing strategy for training transformers

🔎 Similar Papers

No similar papers found.