๐ค AI Summary
This work proposes a closed-loop policy training paradigm tailored for high-frequency replanning in trajectory prediction, addressing the limitations of conventional open-loop models that suffer from covariate shift, error accumulation, and an inability to actively interact with dynamic traffic environments. The approach introduces, for the first time, a goal-directed reactive simulation mechanism: a Transformer-based scene decoder constructs a hybrid environment blending real and self-generated states, integrating both reactive and non-reactive traffic agents to enable effective interaction and error-correction learning between the ego vehicle and others within the closed loop. Combined with policy gradient optimization and receding-horizon replanning, the method reduces collision rates by 27.0% on nuScenes and achieves a more pronounced 79.5% reduction in dense DeepScenario intersections, substantially enhancing obstacle avoidance under high-frequency replanning.
๐ Abstract
Current trajectory prediction models are primarily trained in an open-loop manner, which often leads to covariate shift and compounding errors when deployed in real-world, closed-loop settings. Furthermore, relying on static datasets or non-reactive log-replay simulators severs the interactive loop, preventing the ego agent from learning to actively negotiate surrounding traffic. In this work, we propose an on-policy closed-loop training paradigm optimized for high-frequency, receding horizon ego prediction. To ground the ego prediction in a realistic representation of traffic interactions and to achieve reactive consistency, we introduce a goal-oriented, transformer-based scene decoder, resulting in an inherently reactive training simulation. By exposing the ego agent to a mixture of open-loop data and simulated, self-induced states, the model learns recovery behaviors to correct its own execution errors. Extensive evaluation demonstrates that closed-loop training significantly enhances collision avoidance capabilities at high replanning frequencies, yielding relative collision rate reductions of up to 27.0% on nuScenes and 79.5% in dense DeepScenario intersections compared to open-loop baselines. Additionally, we show that a hybrid simulation combining reactive with non-reactive surrounding agents achieves optimal balance between immediate interactivity and long-term behavioral stability.