AcceRL: A Distributed Asynchronous Reinforcement Learning and World Model Framework for Vision-Language-Action Models

๐Ÿ“… 2026-03-18
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
This work proposes AcceRL, a novel framework that addresses the computational inefficiency and high data demands of large-scale vision-language-action (VLA) models in reinforcement learning. AcceRL introduces, for the first time, a pluggable and trainable world model within a distributed asynchronous reinforcement learning setting. By physically decoupling training, inference, and environment interaction, the framework generates synthetic experiences to dramatically improve sample efficiency. This design overcomes the synchronization bottleneck inherent in conventional approaches, achieving state-of-the-art performance on the LIBERO benchmark. At the algorithmic level, AcceRL significantly enhances training stability and sample efficiency; at the system level, it enables superlinear throughput scaling and high hardware utilization, demonstrating both methodological and engineering advances.

Technology Category

Application Category

๐Ÿ“ Abstract
Reinforcement learning (RL) for large-scale Vision-Language-Action (VLA) models faces significant challenges in computational efficiency and data acquisition. We propose AcceRL, a fully asynchronous and decoupled RL framework designed to eliminate synchronization barriers by physically isolating training, inference, and rollouts. Crucially, AcceRL is the first to integrate a plug-and-play, trainable world model into a distributed asynchronous RL pipeline to generate virtual experiences. Experiments on the LIBERO benchmark demonstrate that AcceRL achieves state-of-the-art (SOTA) performance. Systematically, it exhibits super-linear scaling in throughput and highly efficient hardware utilization. Algorithmically, the world-model-augmented variant delivers unprecedented sample efficiency and robust training stability in complex control tasks.
Problem

Research questions and friction points this paper is trying to address.

Reinforcement Learning
Vision-Language-Action Models
Computational Efficiency
Data Acquisition
World Model
Innovation

Methods, ideas, or system contributions that make the work stand out.

asynchronous reinforcement learning
world model
Vision-Language-Action models
distributed RL
sample efficiency
C
Chengxuan Lu
Wolf 1069B, Sany Group
S
Shukuan Wang
Wolf 1069B, Sany Group
Y
Yanjie Li
Wolf 1069B, Sany Group
W
Wei Liu
Wolf 1069B, Sany Group
S
Shiji Jin
Wolf 1069B, Sany Group
F
Fuyuan Qian
Wolf 1069B, Sany Group
P
Peiming Li
Wolf 1069B, Sany Group
Baigui Sun
Baigui Sun
Wolf 1069 b Lab, Sany Group
ไบบๅทฅๆ™บ่ƒฝใ€่ฎก็ฎ—ๆœบ่ง†่ง‰
Y
Yang Liu
Wolf 1069B, Sany Group