🤖 AI Summary
Existing motion forecasting methods for autonomous driving suffer from two key limitations: object-level models exhibit poor generalization and low geometric accuracy, while occupancy-based approaches—though class-agnostic—lack physical consistency and explicit interaction modeling. To address these issues, we propose the first occupancy-instance joint modeling framework, which jointly encodes scene occupancy and traffic agent instances in the bird’s-eye view (BEV) space. Our method explicitly incorporates kinematic constraints and agent–agent interactions through a dual-branch architecture comprising a BEV encoder, an interaction-enhanced instance encoder, and an instance-enhanced BEV encoder. It supports both FMCW LiDAR and nuScenes multimodal inputs. Evaluated on nuScenes, our approach achieves state-of-the-art performance. Furthermore, benchmarking on an FMCW LiDAR dataset demonstrates strong generalization capability and practical deployment potential.
📝 Abstract
Accurate and reliable spatial and motion information plays a pivotal role in autonomous driving systems. However, object-level perception models struggle with handling open scenario categories and lack precise intrinsic geometry. On the other hand, occupancy-based class-agnostic methods excel in representing scenes but fail to ensure physics consistency and ignore the importance of interactions between traffic participants, hindering the model's ability to learn accurate and reliable motion. In this paper, we introduce a novel occupancy-instance modeling framework for class-agnostic motion prediction tasks, named LEGO-Motion, which incorporates instance features into Bird's Eye View (BEV) space. Our model comprises (1) a BEV encoder, (2) an Interaction-Augmented Instance Encoder, and (3) an Instance-Enhanced BEV Encoder, improving both interaction relationships and physics consistency within the model, thereby ensuring a more accurate and robust understanding of the environment. Extensive experiments on the nuScenes dataset demonstrate that our method achieves state-of-the-art performance, outperforming existing approaches. Furthermore, the effectiveness of our framework is validated on the advanced FMCW LiDAR benchmark, showcasing its practical applicability and generalization capabilities. The code will be made publicly available to facilitate further research.