๐ค AI Summary
This work proposes AcceRL, a novel framework that addresses the computational inefficiency and high data demands of large-scale vision-language-action (VLA) models in reinforcement learning. AcceRL introduces, for the first time, a pluggable and trainable world model within a distributed asynchronous reinforcement learning setting. By physically decoupling training, inference, and environment interaction, the framework generates synthetic experiences to dramatically improve sample efficiency. This design overcomes the synchronization bottleneck inherent in conventional approaches, achieving state-of-the-art performance on the LIBERO benchmark. At the algorithmic level, AcceRL significantly enhances training stability and sample efficiency; at the system level, it enables superlinear throughput scaling and high hardware utilization, demonstrating both methodological and engineering advances.
๐ Abstract
Reinforcement learning (RL) for large-scale Vision-Language-Action (VLA) models faces significant challenges in computational efficiency and data acquisition. We propose AcceRL, a fully asynchronous and decoupled RL framework designed to eliminate synchronization barriers by physically isolating training, inference, and rollouts. Crucially, AcceRL is the first to integrate a plug-and-play, trainable world model into a distributed asynchronous RL pipeline to generate virtual experiences. Experiments on the LIBERO benchmark demonstrate that AcceRL achieves state-of-the-art (SOTA) performance. Systematically, it exhibits super-linear scaling in throughput and highly efficient hardware utilization. Algorithmically, the world-model-augmented variant delivers unprecedented sample efficiency and robust training stability in complex control tasks.