🤖 AI Summary
This paper addresses insufficient generalization in autonomous driving motion prediction caused by redundant environmental representations. To mitigate this, we propose a dual-path redundancy reduction mechanism: (1) structured token compression—leveraging a Transformer decoder to encode variable-length road graphs and agent-local tokens into fixed-dimensional global embeddings; and (2) self-supervised embedding alignment—enforcing consistency of environment-view embeddings under data augmentations via contrastive learning. Our approach significantly enhances representation robustness and generalization under semi-supervised settings. Evaluated on the Waymo Motion Prediction Challenge, it achieves performance on par with HPTR and MTR++, and surpasses PreTraM, Traj-MAE, and GraphDINO. The source code is publicly available.
📝 Abstract
We introduce RedMotion, a transformer model for motion prediction in self-driving vehicles that learns environment representations via redundancy reduction. Our first type of redundancy reduction is induced by an internal transformer decoder and reduces a variable-sized set of local road environment tokens, representing road graphs and agent data, to a fixed-sized global embedding. The second type of redundancy reduction is obtained by self-supervised learning and applies the redundancy reduction principle to embeddings generated from augmented views of road environments. Our experiments reveal that our representation learning approach outperforms PreTraM, Traj-MAE, and GraphDINO in a semi-supervised setting. Moreover, RedMotion achieves competitive results compared to HPTR or MTR++ in the Waymo Motion Prediction Challenge. Our open-source implementation is available at: https://github.com/kit-mrt/future-motion