🤖 AI Summary
To address the challenges of prohibitively long training times, high real-world deployment costs, and poor sim-to-real transfer performance in multi-agent reinforcement learning (MARL) for cyber-physical vehicular systems, this paper proposes a hybrid-reality digital twin framework. The framework introduces an on-demand dynamic parallelization-based load scheduling mechanism to enable elastic scaling of simulation resources, coupled with a systematic domain randomization strategy to enhance policy generalization. Our approach enables efficient collaborative MARL training and zero-shot sim-to-real transfer. Extensive evaluation across cooperative and adversarial traffic scenarios demonstrates its effectiveness: training time is reduced by up to 76.3%, while the sim-to-real performance gap is narrowed to just 2.9%. The method thus achieves a compelling balance between computational scalability and physical deployability.
📝 Abstract
Multi-agent reinforcement learning (MARL) for cyber-physical vehicle systems usually requires a significantly long training time due to their inherent complexity. Furthermore, deploying the trained policies in the real world demands a feature-rich environment along with multiple physical embodied agents, which may not be feasible due to monetary, physical, energy, or safety constraints. This work seeks to address these pain points by presenting a mixed-reality digital twin framework capable of: (i) selectively scaling parallelized workloads on-demand, and (ii) evaluating the trained policies across simulation-to-reality (sim2real) experiments. The viability and performance of the proposed framework are highlighted through two representative use cases, which cover cooperative as well as competitive classes of MARL problems. We study the effect of: (i) agent and environment parallelization on training time, and (ii) systematic domain randomization on zero-shot sim2real transfer across both case studies. Results indicate up to 76.3% reduction in training time with the proposed parallelization scheme and sim2real gap as low as 2.9% using the proposed deployment method.